Dagstuhl Seminar 01361
Foundations of Semistructured Data
( Sep 02 – Sep 07, 2001 )
Permalink
Organizers
- Alberto O. Mendelzon (University of Toronto, CA)
- Thomas Schwentick (TU Dortmund, DE)
- Dan Suciu (University of Washington - Seattle, US)
Contact
Traditional database systems rely on an old model: the relational data model. When it was proposed in the early 1970's by Codd, a logician, the relational model generated a true revolution in data management. In this simple model data is represented as relations in first order structures and queries as first order logic formulas. It enabled researchers and implementors to separate the logical aspect of the data from its physical implementation. Thirty years of research and development followed, and they led to today's mature and highly performant relational database systems.
The age of the Internet brought new data management applications and challenges. Data is now accessed over the Web, and is available in a variety of formats, including HTML, XML, as well as several application specific data formats. Often data is mixed with free text, and the boundary between data and text is sometimes blurred. The way the data can be retrieved also varies considerably: some instances can be downloaded entirely, others can only be accessed through limited capabilities. To accommodate all forms and kinds of data, the database research community has introduced the "semistructured data model", where data is self-describing, irregular, and graph-like. The new model captures naturally Web data, such as HTML, XML, or other application specific formats.
While researchers mostly agree on a common definition of the semistructured data, there is still a lot of confusion about the logical foundations for representing and querying such data: several practical query languages have been proposed, but their formal foundations and their relationships to logical formalisms are poorly understood. This lack of understanding further prevents us from designing general solutions to typical data management problems, such as building indexes, optimizing queries, and designing storage structures. To add to the confusion, the structured document community has studied for several years "structured text", and proposed a number of algebraic operators and accompanying index structures to express queries over structured text. This work definitely has relevance to semistructured data, but their connections are still poorly understood. Current work in academia and research institutions is studying the nature of query languages for semistructured data, and proposing index structures, optimization techniques, and storage mechanisms to support those queries.
This seminar brought together database researchers, logicians, and researchers in structured documents. Furthermore, people from other communities that are related to the area of semistructured data, like information retrieval, programming languages, and discrete algorithms. Besides the presentation of recent research results by the participants additional goals were:
- to identify the main issues for further foundational research on semistructured data,
- to improve the mutual understanding of the communities involved concerning their respective settings and needs.
- Serge Abiteboul (University of Paris South XI, FR) [dblp]
- Franz Baader (TU Dresden, DE) [dblp]
- Michael Benedikt (Bell Labs - Lisle, US) [dblp]
- Alexandru Berlea (Universität Trier, DE)
- Anne Brüggemann-Klein (TU München, DE)
- François Bry (LMU München, DE) [dblp]
- Peter Buneman (University of Edinburgh, GB) [dblp]
- Diego Calvanese (Free University of Bozen-Bolzano, IT) [dblp]
- Peter Fankhauser (Fraunhofer Institut - Darmstadt, DE)
- Juliana Freire (Bell Labs - Murray Hill, US) [dblp]
- Philippa Gardner (Imperial College London, GB) [dblp]
- Giorgio Ghelli (University of Pisa, IT)
- Georg Gottlob (TU Wien, AT) [dblp]
- Gösta Grahne (Concordia Univ. - Montreal, CA)
- Martin Grohe (HU Berlin, DE) [dblp]
- Jan Hidders (University of Antwerp, BE) [dblp]
- Christoph Koch (TU Wien, AT) [dblp]
- Nikolaus Koudas (AT&T Labs Research - Florham Park, US)
- Alberto Laender (Federal University of Minas Gerais - Belo Horizont, BR)
- Laks Lakshmanan (University of British Columbia - Vancouver, CA)
- Hans Leiß (LMU München, DE)
- Ling Liu (Georgia Institute of Technology - Atlanta, US) [dblp]
- David Maier (Oregon Health & Science University - Beaverton, US)
- Alberto O. Mendelzon (University of Toronto, CA)
- Holger Meuss (LMU München, DE)
- Gerome Miklau (University of Washington - Seattle, US) [dblp]
- Uwe Mönnich (Universität Tübingen, DE) [dblp]
- Frank Morawietz (Universität Tübingen, DE)
- Frank Neven (University of Limburg, BE) [dblp]
- Werner Nutt (Heriot-Watt University Edinburgh, GB)
- Arnaud Sahuguet (University of Pennsylvania - Philadelphia, US) [dblp]
- Vladimir Sazonov (University of Liverpool, GB)
- Klaus U. Schulz (LMU München, DE)
- Thomas Schwentick (TU Dortmund, DE) [dblp]
- Luc Segoufin (University of Paris South XI, FR) [dblp]
- Helmut Seidl (TU München, DE) [dblp]
- Dan Suciu (University of Washington - Seattle, US) [dblp]
- Val Tannen (University of Pennsylvania - Philadelphia, US) [dblp]
- Jan Van den Bussche (Hasselt University - Diepenbeek, BE) [dblp]
- Stijn Vansummeren (University of Limburg, BE) [dblp]
- Emmanuel Waller (Université Paris Sud, FR)
- Fang Wei (Universität Freiburg, DE)