Dagstuhl Seminar 21071
Scalable Data Structures
( Feb 14 – Feb 19, 2021 )
Permalink
Organizers
- Gerth Stølting Brodal (Aarhus University, DK)
- John Iacono (UL - Brussels, BE)
- Markus E. Nebel (Universität Bielefeld, DE)
- Vijaya Ramachandran (University of Texas - Austin, US)
Contact
- Shida Kunz (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Schedule
Data is at the core of computing: Computing is about processing, exchanging, and storing data. The organization of data profoundly influences the performance of accessing and manipulating data. By optimizing the way data is stored, performance can be improved by several orders of magnitude when data scales.
Even if the field of data structures is quite mature, new trends and limitations in computer hardware together with the ever-increasing amounts of data that need to be processed raise new questions with respect to efficiency and continuously challenge the existing models of computation. Thermal and electrical power constraints have caused technology to reach the “the power wall” with stagnating single processor performance, meaning that all nontrivial applications need to address scalability with multiple processors, a memory hierarchy and other communication challenges. Scalable data structures are pivotal to this process since they form the backbone of the algorithms driving these applications.
Over the last three decades there has been a successful sequence of Dagstuhl seminars on Data Structures held approximately biennially. This Dagstuhl Seminar shall bring together researchers from several research directions to illuminate solutions to the scalability challenge of data structures. Experts from algorithm theory and algorithm engineering can contribute the latest results from approximate data structures, hashing, streaming data analysis, space-efficient and succinct data structures, sublinear space data structures, fine-grained complexity, computation on noisy data, and fault tolerant computing. On the other hand, experts on machine models, parallel computing, and grand challenge applications will illuminate the context in which these data structures will be used, and associated results and techniques.
Particular emphasis will be on the growing number of processing elements in multicore processors, GPUs, and supercomputers. This Dagstuhl Seminar shall catalyze this direction of research by bringing practitioners on parallel computing together with the data structure community.
About the seminar
Scalable data structures form the backbone for computing: Computing is about processing, exchanging, and storing data. The organization of data profoundly influences the performance of accessing and manipulating data. By optimizing the way data is stored, performance can be improved by several orders of magnitude when data scales. This Dagstuhl seminar brought together researchers from several research directions to illuminate solutions to the scalability challenge of data structures. The seminar was the 14th in a series of loosely related Dagstuhl seminars on data structures. Due to the ongoing Covid-19 pandemic the seminar was purely virtual.
Topics
The presentations covered both advances in classic data structure fields, as well as insights addressing the scalability of computing for different models of computation.
Classic data structure questions on dictionaries, hashing, filters, and heaps were the topic of several talks. Wild (Section 4.8) considered new pointer based search trees, Kozma (Section 4.24) discussed algorithms related to self-adjusting trees and heaps, Bercea (Section 4.4) presented results for dictionaries and filters, Even (Section 4.21) discussed dynamic stable perfect hashing, and Farach-Colton (Section 4.2) presented results for succinct stable hash tables. Johnson (Section 4.25) presented the vector quotient filter based on Robin Hood hashing and designed to exploit SIMD instructions. An external memory dictionary was presented by Conway (Section 4.20) who presented the SplinterDB key-value store for NVMe solid state drives.
For data structure problems on strings, Gørtz (Section 4.9) discussed the support for random access in compact representations of strings, and Starikovskaya (Section 4.22) dictionary look-ups with mismatches.
Data structures for storing and querying static and dynamic graphs were the topic of a sequence of talks. Pettie (Section 4.1) considered trade-offs between space usage and query time for supporting exact distance queries in planar graphs. Rotenberg (Section 4.5) considered planarity testing of dynamic graphs under the insertion and deletion of edges with polylogarithmic update time. Kopelowitz (Section 4.6) considered maintaining the orientation of edges in dynamic forests under the insertion of edges, guaranteeing low out degree of all nodes. Henzinger (Section 4.16) presented an algorithm for maintaining a (Δ+1)-vertex coloring of a graph with maximal degree Δ with constant time edge insertions and deletions. Bast (Section 4.27) gave a demonstration of an implementation of algorithms for real-time searching knowledge graphs with billions of edges.
Parallel algorithms for problems on graphs were addressed in multiple talks. Liu (Section 4.10) considered a parallel algorithm for counting triangles (cliques of size three) in graphs under batched updates of edge insertions and deletions, and Blelloch (Section 4.15) considered parallel batched dynamic algorithms for the minimum spanning tree and minimum cut problems. Sun (Section 4.13) considered a parallel algorithm for the single source shortest path problem using lazy batched priority queues. Shun (Section 4.3) considered a parallel index-based algorithm for graph clustering and an approximation algorithm using locality-sensitive hashing.
Computational models supporting massive parallelism, like GPUs and TCUs, were addressed in talks by Owens (Section 4.11) who considered open-addressing hashing on GPUs, by Geil (Section 4.18) who consider how to solve the maximum clique problem on GPUs, and by Silvestri (Section 4.14) who addressed similarity search with tensor core units. Sitchinava & Jacob (Section 4.19) in their joint talk, considered the power of the atomic and non-atomic versions of the parallel fork-join model. Sanders (Section 4.12) considered how to execute MapReduce computations robustly and efficiently on realistic distributed-memory parallel machines.
Ellen (Section 4.17) considered labelling schemes for networks supporting distributed deterministic radio broadcast using labels of constant-length at the nodes of the network. Lincoln (Section 4.23) presented new techniques for proving fine-grained average-case hardness results, and Fagerberg (Section 4.26) considered the fragile complexity of adaptive algorithms. Finally, Driemel (Section 4.7) considered approximate-near-neighbor data structures for time series under the continuous Fréchet distance.
Final Thoughts
The organizers would like to thank the Dagstuhl team for their continuous support and allowing this seminar to happen as a purely virtual Dagstuhl seminar. They also thank all participants for their contributions to this seminar.
Previous seminars in the series had few female participants. A focus for this seminar was to significantly increase the female attendance. 50% of the invited participants were female, resulting in a 38% female attendance.
Even though the seminar was challenged by the different time zones of the participants, on average 37 of the 48 participants participants attended the talks, and all talks were attended by at least 30 partcipants. In the post-seminar survey it was appreciated that the seminar took place as a virtual seminar instead of being cancelled, but it was also stated that the virtual format can never be as productive as an in-person seminar and showed how much we should appreciate the possibilities Dagstuhl offers under regular circumstances.
- Kunal Agrawal (Washington University - St. Louis, US) [dblp]
- Elena Arseneva (St. Petersburg State University, RU) [dblp]
- Hannah Bast (Universität Freiburg, DE) [dblp]
- Ioana Oriana Bercea (Tel Aviv University, IL) [dblp]
- Guy E. Blelloch (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Gerth Stølting Brodal (Aarhus University, DK) [dblp]
- Rezaul Chowdhury (Stony Brook University, US) [dblp]
- Alexander Conway (VMware Research - Palo Alto, US) [dblp]
- Martin Dietzfelbinger (TU Ilmenau, DE) [dblp]
- Anne Driemel (Universität Bonn, DE) [dblp]
- Amalia Duch Brown (UPC Barcelona Tech, ES) [dblp]
- Faith Ellen (University of Toronto, CA) [dblp]
- Guy Even (Tel Aviv University, IL) [dblp]
- Rolf Fagerberg (University of Southern Denmark - Odense, DK) [dblp]
- Martin Farach-Colton (Rutgers University - Piscataway, US) [dblp]
- Afton Noelle Geil (University of California, Davis, US) [dblp]
- Inge Li Gørtz (Technical University of Denmark - Lyngby, DK) [dblp]
- Monika Henzinger (Universität Wien, AT) [dblp]
- John Iacono (UL - Brussels, BE) [dblp]
- Riko Jacob (IT University of Copenhagen, DK) [dblp]
- Rob Johnson (VMware - Palo Alto, US) [dblp]
- Valerie King (University of Victoria, CA) [dblp]
- Tsvi Kopelowitz (Bar-Ilan University - Ramat Gan, IL) [dblp]
- László Kozma (FU Berlin, DE) [dblp]
- Moshe Lewenstein (Bar-Ilan University - Ramat Gan, IL) [dblp]
- Andrea Lincoln (MIT - Cambridge, US) [dblp]
- Quanquan C. Liu (MIT - Cambridge, US) [dblp]
- Ulrich Carsten Meyer (Goethe-Universität - Frankfurt am Main, DE) [dblp]
- Ian Munro (University of Waterloo, CA) [dblp]
- Markus E. Nebel (Universität Bielefeld, DE) [dblp]
- Eunjin Oh (POSTECH - Pohang, KR) [dblp]
- Rotem Oshman (Tel Aviv University, IL) [dblp]
- John Owens (University of California, Davis, US) [dblp]
- Seth Pettie (University of Michigan - Ann Arbor, US) [dblp]
- Vijaya Ramachandran (University of Texas - Austin, US) [dblp]
- Rajeev Raman (University of Leicester, GB) [dblp]
- Eva Rotenberg (Technical University of Denmark - Lyngby, DK) [dblp]
- Peter Sanders (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Robert Sedgewick (Princeton University, US) [dblp]
- Julian Shun (MIT - Cambridge, US) [dblp]
- Francesco Silvestri (University of Padova, IT) [dblp]
- Nodari Sitchinava (University of Hawaii at Manoa - Honolulu, US) [dblp]
- Tatiana Starikovskaya (ENS - Paris, FR) [dblp]
- Jesper Steensgard (Aarhus University, DK)
- Yihan Sun (University of California - Riverside, US) [dblp]
- Robert Endre Tarjan (Princeton University, US) [dblp]
- Manoharan Vignesh (University of Texas - Austin, US)
- Sebastian Wild (University of Liverpool, GB) [dblp]
Related Seminars
- Dagstuhl Seminar 9145: Data Structures (1991-11-04 - 1991-11-08) (Details)
- Dagstuhl Seminar 9409: Data Structures (1994-02-28 - 1994-03-04) (Details)
- Dagstuhl Seminar 9609: Data Structures (1996-02-26 - 1996-03-01) (Details)
- Dagstuhl Seminar 98091: Data Structures (1998-03-02 - 1998-03-06) (Details)
- Dagstuhl Seminar 00091: Data Structures (2000-02-27 - 2000-03-03) (Details)
- Dagstuhl Seminar 02091: Data Structures (2002-02-24 - 2002-03-01) (Details)
- Dagstuhl Seminar 04091: Data Structures (2004-02-22 - 2004-02-27) (Details)
- Dagstuhl Seminar 06091: Data Structures (2006-02-26 - 2006-03-03) (Details)
- Dagstuhl Seminar 08081: Data Structures (2008-02-17 - 2008-02-22) (Details)
- Dagstuhl Seminar 10091: Data Structures (2010-02-28 - 2010-03-05) (Details)
- Dagstuhl Seminar 14091: Data Structures and Advanced Models of Computation on Big Data (2014-02-23 - 2014-02-28) (Details)
- Dagstuhl Seminar 16101: Data Structures and Advanced Models of Computation on Big Data (2016-03-06 - 2016-03-11) (Details)
- Dagstuhl Seminar 19051: Data Structures for the Cloud and External Memory Data (2019-01-27 - 2019-02-01) (Details)
- Dagstuhl Seminar 23211: Scalable Data Structures (2023-05-21 - 2023-05-26) (Details)
- Dagstuhl Seminar 25191: Adaptive and Scalable Data Structures (2025-05-04 - 2025-05-09) (Details)
Classification
- data bases / information retrieval
- data structures / algorithms / complexity
Keywords
- Data structures
- algorithms
- computational models
- big data
- parallel computation