Processing RSS Feeds in Xcerpt
RSS has become the foremost standard for providing news feeds on changes, additions, or updates to Web sites. It is fairly easy to access RSS data in Xcerpt, e.g., to aggregate feeds from different Web sites or to enrich given information with data from any feed. This article briefly summarizes some sample Xcerpt applications using RSS data.
Unfortunately, RSS has never been properly standardized. Rather several competing variants of RSS have surfaced over recent years, some based on RDF, some not. All of these are to some extent incompatible to each other.
At http://svn.amachos.com/xcerpt/applications/2004/rss-access/ you find several Xcerpt programs for accessing RSS feeds of different Web sites (mostly news sites such as Slashdot or the German television news site Tagesschau)
rule-RSS-Heise- List current news from Heise's news ticker, unfortunately in very early Xcerpt syntax.
rule-RSS-Slashdot- List current news from Slashdot, again unfortunately in very early Xcerpt syntax.
rule-RSS-Sueddeutsche- List current news from the Sueddeutsche Zeitung, again unfortunately in very early Xcerpt syntax.
rule-RSS-DN- List current news from the Swedish Dagens Nyheter, currently the only one of the example that works with the 2004 prototype.
rule-RSS-Tagesschau- List current news from the German Tagesschau news ticker, though it works with the 2004 Xcerpt prototype it fails to produce results as the German Tagesschau has moved from the RDF-based RSS 1.0 to (non-RDF) RSS 2.0, but the Xcerpt program still assumes RSS 1.0.
Eventually, we would like to develop a RSS library that allows access to RSS feeds in any of the published variants but lets the query writer be agnostic of the actual format used. If you are interested in such work (in particular as a student of the University of Munich) please contact Tim Furche or any member of the Xcerpt team.
Very recently, we have also developed and Xcerpt application for accessing Atom feeds, a new IETF standard for event feeds and fine-granular information management. It accesses Google calendar data to provide information about the PMS advanced seminar, cf. [gdata-in-xcerpt].