Dagstuhl-Seminar 23211
Scalable Data Structures
( 21. May – 26. May, 2023 )
Permalink
Organisatoren
- Gerth Stølting Brodal (Aarhus University, DK)
- John Iacono (UL - Brussels, BE)
- László Kozma (FU Berlin, DE)
- Vijaya Ramachandran (University of Texas - Austin, US)
Kontakt
- Marsha Kleinbauer (für wissenschaftliche Fragen)
- Jutka Gasiorowski (für administrative Fragen)
Impacts
- Finding the saddlepoint faster than sorting : article in 2024 Symposium on Simplicity in Algorithms (SOSA) - Dallant, Justin; Haagensen, Frederik; Jacob, Riko; Kozma, László; Wild, Sebastian - Philadelphia : SIAM, 2024. - 11 pp..
- Deterministic Cache-Oblivious Funnelselect : article in 19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024) - - Brodal, Gerth Stølting; Wild, Sebastian - Wadern : LZI, 2024. - pp. 17:1-17:12 - (Leibniz International Proceedings in Informatics ; 294 : article).
- Funnelselect : Cache-Oblivious Multiple Selection : article in 31st Annual European Symposium on Algorithms (ESA 2023) - Brodal, Gerth Stølting; Wild, Sebastian - Wadern : LZI, 2023. - pp. 25:1-25:17 - (Leibniz International Proceedings in Informatics ; 274 : article).
Programm
About the seminar
The scale at which data is generated and processed is increasing unabated and novel applications continue to arise, posing new challenges for data structure design and analysis. The performance of data structures can dramatically affect the overall efficiency of computing systems, motivating research on scalable data structures along the entire spectrum from purely theoretical to purely empirical.
The focus of data structure research has continuously shifted to better align with the changing realities of the underlying hardware (e.g. by refining computational models to capture memory hierarchies and parallelism), and the requirements of applications (e.g. by finding the right input models, novel modes of operation, and special requirements such as data privacy or adapting to side-information). Suitable data structuring abstractions have often been crucial components of algorithmic breakthroughs, for instance in static or dynamic graph algorithms, e.g. for maximum flow or minimum spanning trees. Research on classical problems and long-standing open questions continues, with surprising recent improvements, e.g. for list labeling.
Data structure research has been a core part of computer science from the beginnings, and the field appears as vibrant as ever, with research continuing on deep old questions, as well as on new directions reflecting the changing realities of the computational landscape. This seminar was the 15th in a series of loosely related Dagstuhl Seminars on data structures, bringing together researchers from several research directions to illuminate various aspects and solutions to the problem of data structure scalability. Following the previous, purely virtual meeting, the seminar was organized as a fully on-site event.
Topics
The presentations covered both advances in classic data structure fields, as well as insights that addressed the scalability of computing in different models of computation and across a diverse range of applications. In the following, we make reference to the relevant sections of the full report.
Parallelism was an important theme of the seminar. Blelloch (Section 4.1) discussed possible ways of making parallelism a core part of a computer science curriculum, Agrawal (Section 4.12) talked about incorporating data structures in parallel algorithms, and Sun (Section 4.5) presented algorithms for Longest Increasing Subsequence, building on parallel data structures.
Classic questions of data structure design and analysis in the comparison- or pointer-based models were the topic of multiple talks. Tarjan (Section 4.17) discussed various self-adjusting heaps and new results on their amortized efficiency. Munro (Section 4.24) talked about the ordered majority problem. Sorting was the subject of a series of talks: Wild (Section 4.3) talked about new, practically efficient sorting algorithms used in Python. Nebel (Section 4.6) talked about the practical efficiency of Lomuto’s QuickSort variant. Pettie (Section 4.21) studied efficiently sorting inputs with pattern-avoiding properties, and Jacob (Section 4.27) studied variants of the sorting problem with priced comparisons.
Several talks reflected the central importance of hashing in scalable data structures, giving a broad picture of modern developments in this area. Conway (Section 4.2) presented Iceberg Hashing and Mosaic Pages. Sanders (Section 4.7) presented Sliding Block Hashing. Farach-Colton (Section 4.9) considered a simplified hash table design with strong guarantees. Bercea (Section 4.18) presented a data structure for incremental stable perfect hashing, Johnson (Section 4.22) presented Maplets, and Even (Section 4.29) talked about dynamic filters with one memory access.
The very recent O(log3/2n)n)-time online list labeling result was presented by Wein (Section 4.28) and results for the online list labeling problem with machine learning advice were presented by Singh (Section 4.8).
Static and dynamic graph algorithms were the topic of several presentations. Kyng (Section 4.12) talked about dynamic spanners and data structuring problems arising in the context of minimum cost flow. King (Section 4.13) presented algorithms for dynamic connectivity and Rotenberg (Section 4.15) presented dynamic graph algorithms that are adaptive to sparsity of the input. For data structures in string problems, Gørtz (Section 4.20) discussed regular expression matching and Kopelowitz (Section 4.16) presented speedups of the dynamic program for the Dyck edit distance problem.
Multiple talks reflected the interplay between data structures and computational geometry: Dallant (Section 4.30) spoke about conditional lower bounds for dynamic geometric problems, Oh (Section 4.25) presented an algorithm for the planar disjoint paths problem, and Arseneva (Section 4.23) talked about morphing graph drawings, including results obtained jointly with Oh during the seminar.
Possible computational models for designing algorithms for GPUs were addressed by Sitchinava (Section 4.19) and models of in-memory processing were presented by Silvestri (Section 4.26).
Phillips (Section 4.10) presented the design and analysis of a large-scale stream monitoring system. Liu (Section 4.4) discussed the role of scalable data structures in the context of differential privacy for graph data. Xu (Section 4.11) presented a search-optimized layout for packed memory arrays.
Final thoughts
The organizers would like to thank the Dagstuhl team for their continuous support and also thank all participants for their contributions to this seminar. Following the earlier virtual seminar, the current (15th) seminar was fully on-site. The opportunity to personally meet and interact was highly appreciated by the community, as reflected by a very strong response to the first round of invitations and subsequent positive feedback. The seminar fills a unique need in bringing together data-structures-researchers from around the world and facilitating collaboration and exchange of ideas between them.
Earlier seminars in the series had few female participants. An important focus of the previous and the current seminar was to significantly increase female attendance. In the current seminar, 48% of the invited participants were female, resulting in a 37% female attendance. Another important focus of the seminar is to encourage the interaction between senior and early career researchers, the latter comprising 27% of the invited participants and 32% of the eventual attendees.
In the post-seminar survey the diversity of junior/senior and female/male participants were both appreciated, respondents also drawing attention to the importance of tactful and clear communication on these matters. The survey respondents also praised the coverage of a diverse range of research topics, as well as the mix between theoreticians and more applied researchers.
Data structures enable the efficient storage and retrieval of data in a variety of settings. Data structures are central to almost all of computing and have been an object of study since the beginning of the discipline of computer science. For a given abstract data type and a certain model of computation, speed and small space are the usual aims of data structures research. Due to the unprecedented scale at which data structures are deployed in applications, even small performance gains can have a significant effect. Thus, the study of data structures remains a very active area of research with unresolved old problems, new objects of interest, new techniques for analysis, and an interplay between theoretical data structure design, developments in computer hardware, and the needs of modern applications.
Data structures are not designed in isolation from the computing devices that they run on. The ultimate goal is to design data structures that are fast in practice. As hardware advances, models of computation must also evolve. The original and most simplistic models included the random access machine (RAM) and comparison models. Later models attempted to more accurately capture various hardware effects; these include the external memory/DAM model, the cache-oblivious model of multi-level memory systems and the streaming model. A compelling goal of data structure research is to develop algorithms that are optimal for several computational models at the same time, and as a counterpoint, to study inherent trade-offs between the cost measures of the different models, e.g., query time vs. space usage, read time vs. write time, computational time vs. block transfers to/from external memory, or optimality in the external memory model vs. optimality in the cache-oblivious model.
Parallel computation is crucial to the scalability of data structures and has attained particular importance as single processor hardware is no longer prevalent, even on mobile devices. Modern parallelism takes many forms, from several cores on a modern CPU to several machines on a network, to specialized hardware. Each of these settings requires us to revisit classical models of parallelism such as the PRAM, fork-join, and bulk synchronous parallel models, in order to evaluate how well they apply and then to develop effective methods. GPUs present their own challenges, as they are often the most powerful computing unit in a modern machine, yet have a particular architecture that is a vestige of their origins in graphics processing and defies simple modeling.
Dagstuhl Seminars on data structures have been held roughly biennially since 1991. These seminars have played a very important role in moving a community that was focused on theoretical analysis to implementation, experimentation, and deployment. This Dagstuhl Seminar will, in addition to the “classical” data structuring topics, place special emphasis on spawning new research within data structures for multithreaded, bulk-synchronous, GPU, and emerging computing paradigms such as tensor cores.
- Kunal Agrawal (Washington University - St. Louis, US) [dblp]
- Elena Arseneva (University of Lugano, CH) [dblp]
- Michael A. Bender (Stony Brook University, US) [dblp]
- Ioana Oriana Bercea (IT University of Copenhagen, DK) [dblp]
- Guy E. Blelloch (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Gerth Stølting Brodal (Aarhus University, DK) [dblp]
- Alexander Conway (VMware Research - Palo Alto, US) [dblp]
- Justin Dallant (UL - Brussels, BE)
- Guy Even (Tel Aviv University, IL) [dblp]
- Rolf Fagerberg (University of Southern Denmark - Odense, DK) [dblp]
- Martin Farach-Colton (Rutgers University - Piscataway, US) [dblp]
- Inge Li Gørtz (Technical University of Denmark - Lyngby, DK) [dblp]
- John Iacono (UL - Brussels, BE) [dblp]
- Riko Jacob (IT University of Copenhagen, DK) [dblp]
- Rob Johnson (VMware - Palo Alto, US) [dblp]
- Valerie King (University of Victoria, CA) [dblp]
- Tsvi Kopelowitz (Bar-Ilan University - Ramat Gan, IL) [dblp]
- László Kozma (FU Berlin, DE) [dblp]
- Rasmus Kyng (ETH Zürich, CH) [dblp]
- Moshe Lewenstein (Bar-Ilan University - Ramat Gan, IL) [dblp]
- Quanquan C. Liu (Northwestern University - Evanston, US) [dblp]
- Ulrich Carsten Meyer (Goethe-Universität - Frankfurt am Main, DE) [dblp]
- Ian Munro (University of Waterloo, CA) [dblp]
- Markus E. Nebel (Universität Bielefeld, DE) [dblp]
- Eunjin Oh (POSTECH - Pohang, KR) [dblp]
- Rotem Oshman (Tel Aviv University, IL) [dblp]
- Seth Pettie (University of Michigan - Ann Arbor, US) [dblp]
- Cynthia A. Phillips (Sandia National Labs - Albuquerque, US) [dblp]
- Vijaya Ramachandran (University of Texas - Austin, US) [dblp]
- Rajeev Raman (University of Leicester, GB) [dblp]
- Eva Rotenberg (Technical University of Denmark - Lyngby, DK) [dblp]
- Peter Sanders (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Francesco Silvestri (University of Padova, IT) [dblp]
- Shikha Singh (Williams College - Williamstown, US) [dblp]
- Nodari Sitchinava (University of Hawaii at Manoa - Honolulu, US) [dblp]
- Yihan Sun (University of California - Riverside, US) [dblp]
- Robert Endre Tarjan (Princeton University, US) [dblp]
- Nicole Wein (Rutgers University - Piscataway, US) [dblp]
- Sebastian Wild (University of Liverpool, GB) [dblp]
- Helen Xu (Lawrence Berkeley National Laboratory, US) [dblp]
Verwandte Seminare
- Dagstuhl-Seminar 9145: Data Structures (1991-11-04 - 1991-11-08) (Details)
- Dagstuhl-Seminar 9409: Data Structures (1994-02-28 - 1994-03-04) (Details)
- Dagstuhl-Seminar 9609: Data Structures (1996-02-26 - 1996-03-01) (Details)
- Dagstuhl-Seminar 98091: Data Structures (1998-03-02 - 1998-03-06) (Details)
- Dagstuhl-Seminar 00091: Data Structures (2000-02-27 - 2000-03-03) (Details)
- Dagstuhl-Seminar 02091: Data Structures (2002-02-24 - 2002-03-01) (Details)
- Dagstuhl-Seminar 04091: Data Structures (2004-02-22 - 2004-02-27) (Details)
- Dagstuhl-Seminar 06091: Data Structures (2006-02-26 - 2006-03-03) (Details)
- Dagstuhl-Seminar 08081: Data Structures (2008-02-17 - 2008-02-22) (Details)
- Dagstuhl-Seminar 10091: Data Structures (2010-02-28 - 2010-03-05) (Details)
- Dagstuhl-Seminar 14091: Data Structures and Advanced Models of Computation on Big Data (2014-02-23 - 2014-02-28) (Details)
- Dagstuhl-Seminar 16101: Data Structures and Advanced Models of Computation on Big Data (2016-03-06 - 2016-03-11) (Details)
- Dagstuhl-Seminar 19051: Data Structures for the Cloud and External Memory Data (2019-01-27 - 2019-02-01) (Details)
- Dagstuhl-Seminar 21071: Scalable Data Structures (2021-02-14 - 2021-02-19) (Details)
- Dagstuhl-Seminar 25191: Adaptive and Scalable Data Structures (2025-05-04 - 2025-05-09) (Details)
Klassifikation
- Data Structures and Algorithms
- Information Retrieval
Schlagworte
- Data structures
- algorithms
- computational models
- big data
- parallel computation