On 3-years of visXcerpt Demos
RDFS Reasoning in Xcerpt
RDF Normalization and Parsing in Xcerpt
Processing RSS Feeds in Xcerpt
Recursive Tree Transformations in Xcerpt
XML Query Use Cases in Xcerpt
RDF Use Cases in Xcerpt
GData and Atom Processing in Xcerpt
Fibonacci? Lazy Generative Arithmetics
Xcerpt's patterns are great at extracting
(complex, interrelated) information from
semi-structured data and reassembling that data
in new ways. However, there is another common
task when processing semi-structured data:
transformations where the result is closely
related (at least) in structure to the input but
some changes to the data are made.
RDFS entailment (or reasoning) in Xcerpt is
provided by a set of rules described in the
following. In 2004/2005 we started to work on a
better integration of RDF into Xcerpt. Xcerpt has
been conceived from the very beginning as
a
versatile
query language capable of accessing any form of
semi-structured data. However, in practice each
format has its own, specific challenges.
Accessing RDF in a general semi-structured query
language such as Xcerpt requires the parsing and
normalization of RDF serializations.
Unfortunately, there are many different
serializations for RDF (cf.
Oliver Bolzer's master thesis
for an overview). Furthermore, the W3C
recommended serialization called
RDF/XML
is often criticized for its unnecessary
complexity and representational variety. To
access data in RDF/XML we present a set of rules
that normalize that data into an easy to
understand internal triple representation
(similar to
RXR
or
N-triples).
RSS
has become
the
foremost standard for providing
news feeds
on changes, additions, or updates to Web sites.
It is fairly easy to access RSS data in Xcerpt,
e.g., to aggregate feeds from different Web sites
or to enrich given information with data from any
feed. This article briefly summarizes some sample
Xcerpt applications using RSS data.
Xcerpt's patterns are great at extracting
(complex, interrelated) information from
semi-structured data and reassembling that data
in new ways. However, there is another common
task when processing semi-structured data:
transformations where the result is closely
related (at least) in structure to the input but
some changes to the data are made.
As part of the ongoing effort on
XQuery,
the W3C has released a set of
use cases for XML query
languages.
Though the selection is, arguably, slightly
tilted by the features of XQuery and expectations
of the corresponding working group, it is
nevertheless a very illuminating set of use cases
for evaluating the adequacy of any XML query
language. The remainder of this article describes
our solutions for these use cases in Xcerpt.
Actually, you can find two sets of solutions for
these use cases. The first set is part of
Sebastian Kraus' master thesis.
Though there are some bugs left in the solutions
and some solutions are missing, the solution set
can actually be tested with the 2004 Xcerpt
prototype. You can find the corresponding Xcerpt
programs at
http://svn.amachos.com/xcerpt/applications/2005/use-cases/w3c-xml-query/.
More details on these solutions are described
in
Sebastian's master thesis.
At the start of the W3C activity on standardizing
a query or data access language for RDF, a set of
use cases for such a language has been published.
We have started implementing some of these use
cases in Xcerpt, as described in the following.
The
RDF Data Access Use Cases and Requirements
specification contains a number of use cases for
RDF query (or data access) languages. These use
cases are, however, notably different from
the ones published by the W3C for XML
Query:
The XML Query use cases are rather precise,
defining both input data and desired output. In
contrast, the RDF use cases are far less precise,
often rather vague descriptions of settings and
application scenarios and are interspersed with
irrelevant information. Nevertheless, we have
started to implement some of these while also
developing a (small) number of our own use and
demonstration cases more in the spirit of the XML
Query use cases.
Touted as the successor to the widely deployed,
but technically rather unsatisfying RSS, Atom is
a recent IETF standard for Web feeds. Together
with the Atom publishing protocol it allows not
only access to, but also creation, change, and
deletion of entries in Web sites such as blogs or
other collections of data. Indeed, Google uses an
extension of Atom, dubbed GData, as API for its
calendar service GCalendar. In the following,
access to GCalendar from Xcerpt for structured
display of upcoming events is discussed.
Xcerpt's computational power is rather obviously
turing-complete. So let's solve some problems
that are unusual for a query language, but
standard tasks for general programming languages.
The first problem discussed in this series are a
rule-set for computing the
Fibonacci numbers.
Such a rule-set might actually be useful in a
query language, e.g., to test whether a set of
queried data values corresponds to a Fibonacci
distribution. However, there are a number of
problems with a program for generating Fibonacci
numbers in Xcerpt. These problems are discussed
in the following.