Dagstuhl Seminar 19211
Enumeration in Data Management
( May 19 – May 24, 2019 )
Permalink
Organizers
- Endre Boros (Rutgers University - Piscataway, US)
- Benny Kimelfeld (Technion - Haifa, IL)
- Reinhard Pichler (TU Wien, AT)
- Nicole Schweikardt (HU Berlin, DE)
Contact
- Andreas Dolzmann (for scientific matters)
- Annette Beyer (for administrative matters)
Schedule
The general task of enumeration is to evaluate a function that maps a given input into a (possibly large) collection of answers. Traditional examples include the maximal independent sets of a graph (or a hypergraph), the simple paths from a source to a destination in a network, the satisfying assignments of a logical formula, and the tuples in the result of a database query. The ability to enumerate with short delays, particularly "polynomial delay" and "incremental polynomial delay," has taken a significant focus by both theory and system-oriented researchers throughout the history of database research. For example, an algorithm for enumerating the minimal keys of a relation with incremental polynomial delay was already presented in the 1970s. Moreover, in recent years, various computational problems at the core of data management have been revisited through the lenses of enumeration algorithms and complexity. One class of problems includes data-mining challenges such as frequent patterns in graphs and communities in social networks. Another class of problems is driven by query optimization and includes the enumeration of tree decompositions and provenance expressions. Additional examples can be found in Information Retrieval from structured data, Information Extraction from text, data integration, and database cleaning.
While different in nature, the analyses of these problems share a few algorithmic techniques for devising solvers, and proof techniques for establishing complexity bounds. Consequently, a recent effort has been initiated to define suitable complexity classes for enumeration problems. This theory requires revisiting concepts such as computational hardness, reductions between computational problems, and complexity assumptions for lower bounds. As another direction, the fields of Logic in Computer Science and Database Theory have seen a number of contributions that deal with a stronger notion of efficiency of enumeration of query results. In this scenario, the objective is as follows: given a logical structure (i.e., a database) and a logical formula (i.e., a query), after a short preprocessing phase, the query results shall be generated one by one, without repetition, with guarantees on the maximum delay time between the output of two tuples. In this vein, the best that one can hope for is constant delay (i.e., the delay may depend on the size of the query but not on that of the database) and linear preprocessing time. Established results include the complete classification of classes of conjunctive queries into the queries that admit a constant delay following a linear preprocessing time, and the ones that do not. The lower bounds are based on algorithmic assumptions on limitation in Polynomial Time, such as the complexity of multiplying Boolean matrices. The upper bounds (constant delay) are based on the construction of auxiliary data structures (indexes). The natural question of whether such structures can be maintained efficiently (sub-linearly) has led to the pursue and significant progress of the "dynamic complexity" of enumeration. Constant-delay enumeration has also been adopted as a central concept in "factorized databases" that have gained recent attention.
This week-long Dagstuhl Seminar targets two main goals. The first goal is to better understand the recent developments, lay out the most important open problems, and join forces towards solutions thereof. To this end, we plan to schedule invited tutorials, and ask tutorial speakers to prepare lists of open problems. Moreover, we plan to select a few of these open problems and run small working groups to focus on each. The second goal is to establish and disseminate a toolkit for data-centric enumeration problems. We will seek tools of various types. One type is algorithmic techniques for enumeration with different guarantees, such as methods for hereditary graph properties, schemes for ranked enumeration, and reductions to query answering with constant delay. Another type of tools consists of proof techniques for lower bounds, including indicator problems such as the hypergraph transversals, Boolean matrix multiplication, and Boolean online matrix-vector multiplication.
In recent years, various concepts of enumeration have arisen in the fields of Databases, Computational Logic, and Algorithms, motivated by applications of data analysis and query evaluation. Common to all concepts is the desire to compute a stream of items with as small as possible waiting time between consecutive items, referred to as the "delay." Alongside each concept, there evolved algorithmic techniques for developing solvers, and proof techniques for establishing complexity bounds. In addition to the traditional guarantees of "polynomial delay" and "incremental polynomial," researchers have been pursuing stronger guarantees such as "constant delay" in the context of logical query evaluation, "dynamic complexity" of incremental maintenance, and "factorized databases." The growing interest and rapid evolution of the associated research brings up opportunities of significantly accelerating the computation of big results, by devising and adopting general-purpose methodologies.
In Dagstuhl Seminar 19211 on “Enumeration in Data Management,” key researchers from relevant communities have gathered to gain a better understanding the recent developments, lay out the important open problems, and join forces towards solutions thereof. These communities include researchers who explore enumeration problems in the fields of databases, logic, algorithms and computational complexity. We have had invited tutorials by- Luc Segoufin on Constant-delay enumeration
- Takeaki Uno on Enumeration algorithms
- Yann Strozecki on Enumeration complexity – defining tractability
- Markus Kröll on Enumeration complexity – a complexity theory for hard enumeration problems
- Endre Boros on Monotone generation problems.
We also had presentations by most of the other participants. Moreover, the participants have prepared in advance a list of open problems in a document that we shared and jointly maintained. We have discussed the open problems during designated times of the seminar.
The organizers are highly satisfied with the seminar. We have got a very high acceptance rate for our invitations. In fact, there were further researchers whom we would have liked to invite after the first invitation round but, unfortunately, no room was left. The participants were exceptionally involved and engaged. Some considerable progress has been made on the open problems prepared in advance, as will be reported in future publications that will acknowledge the seminar. The seminar has also initiated joint efforts to disseminate toolkits for data-centric enumeration problems, including algorithmic techniques, proof techniques, and important indicator problems. To this end, we have had sessions of working groups for the different types of toolkit components. In particular, we have initiated a Wikipedia page on enumeration algorithms:
https://en.wikipedia.org/wiki/Enumeration_algorithm
This page will evolve to contain a thorough picture of the principles and techniques of enumeration problems.
- Amir Abboud (IBM Almaden Center - San Jose, US) [dblp]
- Kira V. Adaricheva (Hofstra University - Hempstead, US) [dblp]
- Antoine Amarilli (Telecom ParisTech, FR) [dblp]
- Kristof Berczi (Eötvös Lorand University - Budapest, HU) [dblp]
- Christoph Berkholz (HU Berlin, DE) [dblp]
- Endre Boros (Rutgers University - Piscataway, US) [dblp]
- Pierre Bourhis (CNRS - CRIStAL, Lille, FR) [dblp]
- Nofar Carmeli (Technion - Haifa, IL) [dblp]
- Ondrej Cepek (Charles University - Prague, CZ) [dblp]
- Nadia Creignou (Aix-Marseille University, FR) [dblp]
- Arnaud Durand (University Paris-Diderot, FR) [dblp]
- Khaled M. Elbassioni (Khalifa University - Abu Dhabi, AE) [dblp]
- Etienne Grandjean (Caen University, FR) [dblp]
- Alejandro J. Grez (Pontificia Universidad Catolica de Chile, CL) [dblp]
- Aritanan Gruber (University of ABC - Santo André, BR) [dblp]
- Mamadou Moustapha Kanté (Université Clermont Auvergne - Aubiere, FR) [dblp]
- Batya Kenig (University of Washington - Seattle, US) [dblp]
- Benny Kimelfeld (Technion - Haifa, IL) [dblp]
- Christoph Koch (EPFL - Lausanne, CH) [dblp]
- Phokion G. Kolaitis (University of California - Santa Cruz, US) [dblp]
- Markus Kröll (TU Wien, AT) [dblp]
- Ester Livshits (Technion - Haifa, IL) [dblp]
- Kazuhisa Makino (Kyoto University, JP) [dblp]
- Andrea Marino (Univerisità degli Studi di Firenze, IT) [dblp]
- Wim Martens (Universität Bayreuth, DE) [dblp]
- Stefan Mengel (CNRS, CRIL - Lens, FR) [dblp]
- Shin-Ichi Nakano (Guuma University - Kiryu, JP) [dblp]
- Matthias Niewerth (Universität Bayreuth, DE) [dblp]
- Lhouari Nourine (University Clermont Auvergne, FR) [dblp]
- Liat Peterfreund (Technion - Haifa, IL) [dblp]
- Reinhard Pichler (TU Wien, AT) [dblp]
- Cristian Riveros (Pontificia Universidad Catolica de Chile, CL) [dblp]
- Yehoshua Sagiv (The Hebrew University of Jerusalem, IL) [dblp]
- Nicole Schweikardt (HU Berlin, DE) [dblp]
- Thomas Schwentick (TU Dortmund, DE) [dblp]
- Luc Segoufin (INRIA & ENS Paris, FR) [dblp]
- Yann Strozecki (University of Versailles, FR) [dblp]
- Alexandre Termier (IRISA - University of Rennes, FR) [dblp]
- Etsuji Tomita (The University of Electro-Communications - Tokyo, JP) [dblp]
- György Turan (University of Illinois - Chicago, US) [dblp]
- Martin Ugarte (PUC Chile, CL) [dblp]
- Takeaki Uno (National Institute of Informatics - Tokyo, JP) [dblp]
- Stijn Vansummeren (Free University of Brussels, BE) [dblp]
- Alexandre Vigny (University of Warsaw, PL) [dblp]
- Heribert Vollmer (Leibniz Universität Hannover, DE) [dblp]
- Thomas Zeume (TU Dortmund, DE) [dblp]
Classification
- data bases / information retrieval
- data structures / algorithms / complexity
- verification / logic
Keywords
- Enumeration
- polynomial delay
- constant delay
- dynamic complexity
- databases
- query evaluation