Dagstuhl Seminar 12171
Semantic Data Management
( Apr 22 – Apr 27, 2012 )
Permalink
Organizers
- Karl Aberer (EPFL - Lausanne, CH)
- Grigoris Antoniou (University of Huddersfield, GB)
- Oscar Corcho (Technical University of Madrid, ES)
- Rudi Studer (KIT - Karlsruher Institut für Technologie, DE)
Coordinator
- Elena Simperl (KIT - Karlsruher Institut für Technologie, DE)
Contact
- Annette Beyer (for administrative matters)
Press Reviews
- Dagstuhl: Semantic Data Management
Blog entry by Paul Groth, published May 1, 2012.
The Semantic Web represents the next generation World Wide Web, where information is published and interlinked in order to facilitate the exploitation of its structure and semantics (meaning) for both humans and machines. To foster the realization of the Semantic Web, the World Wide Web Consortium (W3C) developed a set of metadata (RDF), ontology languages (RDF Schema and OWL variants), and query languages (e.g., SPARQL). Research in the past years has been mostly concerned with the definition and implementation of these languages, the development of accompanying ontology technologies, and applications in various domains. This work has been very successful, and semantic web technologies are being increasingly adopted by mainstream corporations and governments (for example by the UK and USA governments) and by several Science communities (for example, Life Sciences or Astronomy). Moreover, semantic technologies are at the core of future developments, e.g. in the UK Open Data Institute. However, compared to more traditional solutions, semantic technologies often appear to be immature, and current tools lag behind in terms of efficiently handling of large data sets. What are additionally needed include solid data management concepts, architectures, and tools that follow the paradigms of more traditional database (DB) and information retrieval (IR) systems. Semantic data management refers to a range of techniques for the manipulation and usage of data based on its meaning. The aim of this workshop was to discuss in-depth a number of crucial issues, with particular emphasis on the fruitful exchange of ideas between the semantic web, database systems and information retrieval communities. Relevant key questions cutting across all topics covered were: (i) how can existing DB and IR solutions be adapted to manage semantic data; and (ii) are there new challenges that arise for the DB and IR communities (i.e. are radically new techniques required)?
For the purposes of this workshop, and for this report, we understand semantic data simply as data expressed in RDF, the lingua franca of linked open data and hence the default data model for annotating data on the Web. The workshop was organized along the following key themes:
- Scalability: In order to make semantic technologies take on the targeted market share, it is indispensable that technological progress allows semantic repositories to scale to the large amount of semantic data that is already available and keeps growing. It is essential to come close to performance parity with some of the best DB solutions without having to omit the advantages of a higher schema flexibility compared to the relational model. Moreover, the exploitation of semantic data on the Web requires managing the scale that so far can only be handled by the major search engine providers. However, this should be possible without losing the advantages of a higher query expressivity compared to basic key-value stores and IR solutions.
- Provenance. An important aspect when integrating data from a large number of heterogeneous sources under diverse ownership is the provenance of data or parts thereof; provenance denotes the origin of data and can also include information on processing or reasoning operations carried out on the data. In addition, provenance allows for effectively supporting trust mechanisms and policies for privacy and rights management.
- Dynamicity. Another important property of many (semantic) data is its dynamicity. While some data, such as public administration archives or collections of text documents might not change too frequently, other data, coming from sensors, RSS, user-generated content (e.g, microblogging), etc., might evolve on a per millisecond basis. The effects of such changes have to be addressed through a combination of stream processing, mining, and semantics-based techniques.
- Search and Ranking. The large and growing amount of semantic data enables new kinds of applications. At the same time, more data means that ultimately, there might be more results produced from it that one can or desires to inspect. Data and results to concrete information needs vary in the degree of relevance. Effective ranking mechanisms that incorporate the information needs as well as contextual information into account can deliver and rank pertinent results and help the users to focus on the part of the data that is relevant.
- Karl Aberer (EPFL - Lausanne, CH) [dblp]
- Grigoris Antoniou (University of Huddersfield, GB) [dblp]
- Marcelo Arenas (Pontificia Universidad Catolica de Chile, CL) [dblp]
- Wolf-Tilo Balke (TU Braunschweig, DE) [dblp]
- James Cheney (University of Edinburgh, GB) [dblp]
- Oscar Corcho (Technical University of Madrid, ES) [dblp]
- Philippe Cudré-Mauroux (University of Fribourg, CH) [dblp]
- Gianluca Demartini (University of Fribourg, CH) [dblp]
- Orri Erling (Openlink Software, NL)
- Dieter Fensel (Universität Innsbruck, AT)
- Norbert Fuhr (Universität Duisburg-Essen, DE) [dblp]
- Avigdor Gal (Technion - Haifa, IL) [dblp]
- José Manuel Gómez-Pérez (ISOCO - Madrid, ES)
- Alasdair J G Gray (University of Manchester, GB)
- Marko Grobelnik (Jozef Stefan Institute - Ljubljana, SI) [dblp]
- Paul Groth (VU University Amsterdam, NL) [dblp]
- Andrey Gubichev (TU München, DE)
- Peter Haase (fluid Operations AG - Walldorf, DE) [dblp]
- Stephen Harris (Garlik Ltd. - London, GB)
- Olaf Hartig (HU Berlin, DE) [dblp]
- Manfred Hauswirth (National University of Ireland - Galway, IE) [dblp]
- Jeff Heflin (Lehigh University - Bethlehem, US) [dblp]
- Spyros Kotoulas (IBM Research - Dublin, IE) [dblp]
- Paolo Missier (University of Newcastle, GB) [dblp]
- Luc Moreau (University of Southampton, GB) [dblp]
- Charalampos Nikolaou (University of Athens, GR)
- Ivana Podnar Zarko (University of Zagreb, HR)
- Edna Ruckhaus (Universidad S. Bolivar - Caracas, VE)
- Satya S. Sahoo (Case Western Reserve University - Cleveland, US)
- Manuel Salvadores (Stanford University, US)
- Ralf Schenkel (MPI für Informatik - Saarbrücken, DE) [dblp]
- Juan F. Sequeda (University of Texas - Austin, US) [dblp]
- Wolf Siberski (Leibniz Universität Hannover, DE)
- Elena Simperl (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Kavitha Srinivas (IBM TJ Watson Research Center - Yorktown Heights, US)
- Rudi Studer (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Kerry Taylor (CSIRO - Canberra, AU) [dblp]
- Martin Theobald (MPI für Informatik - Saarbrücken, DE) [dblp]
- Bryan Thompson (SYSTAP - Greensboro, US) [dblp]
- Duc Thanh Tran (KIT - Karlsruher Institut für Technologie, DE)
- Frank van Harmelen (VU University Amsterdam, NL) [dblp]
- Maria-Esther Vidal (Universidad S. Bolivar - Caracas, VE) [dblp]
- Valentin Zacharias (FZI - Karlsruhe, DE)
Classification
- Semantics/formal methods
- Databases/information retrieval
Keywords
- Semantic data
- Semantic Web
- Linked Data
- large-scale data management
- dynamicity and stream processing
- provenance and access control
- information retrieval and ranking