Photo: Flickr user lifeontheedge

Sunday, April 20, 2008

WikiXMLDB provides a way of querying Wikipedia with XQuery.

With all the benefits that Wikipedia promises, it is not easy to use it off-the-shelf in applications. While Wikipedia is available for download in an XML format, individual articles are formatted in a proprietary wiki format. So the most interesting uses of Wikipedia in applications are still locked behind the access troubles.

Here is where WikiXMLDB comes to the rescue. We have parsed the entire English Wikipedia content into XML representation (its total size is about 21GB), loaded it into Sedna and provided a query interface to it. Now you can dissect individual articles, rip out abstracts, sections, links, infoboxes and other components. Or you can combine pieces of existing documents into new XML documents and convert them to web pages with XSLT for example. And you can do it all using the standard W3C XQuery Language. So finally you can start enriching your content with data from Wikipedia and unlock its power for your applications.

Cool.

No comments: