Dagstuhl-Seminar 24101
Robust Query Processing in the Cloud
( 03. Mar – 08. Mar, 2024 )
Permalink
Organisatoren
- Anastasia Ailamaki (EPFL - Lausanne, CH)
- Goetz Graefe (Google - Madison, US)
- Allison Lee (Snowflake - San Mateo, US)
- Caetano Sauer (Salesforce - München, DE)
Kontakt
- Michael Gerke (für wissenschaftliche Fragen)
- Susanne Bach-Bernhard (für administrative Fragen)
Gemeinsame Dokumente
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
The Dagstuhl Seminar on “Robust Query Processing in the Cloud” (24101) assembled researchers from industry and academia for the fifth time to discuss robustness issues in database query performance, this time with a focus on Cloud Computing. The seminar gathered researchers around the world working on indexing, storage, plan generation and plan execution in database query processing, and in cloud-based massively parallel systems with the purpose to address the open research challenges with respect to the robustness of database management systems. Delivering robust query performance is well known to be a difficult problem for database management systems. All experienced DBAs and database users are familiar with sudden disruptions in data centers due to poor performance of queries that have performed perfectly well in the past. The goal of the seminar was to discuss the current state-of-the-art, to identify specific research opportunities in order to improve the state-of-affairs in query processing, and to develop new approaches or even solutions for these opportunities, building upon successes of the past Dagstuhl Seminars [2, 3, 5, 1, 4, 6]. The organizers (Goetz Graefe, Allison Lee, and Caetano Sauer) this time attempted to have a focused subset of topics that the participants discussed and analyzed in more depth. From the proposed topics on algorithm choices, join sequences, storage architectures, database utilities, modern storage hardware, cloud database economics, and benchmarking for robust query processing, the participants formed five work groups: i) robustness benchmarking, ii) economics of query processing in the cloud, iii) storage architectures, iv) out-of-memory query operators, and v) indexing for data warehousing. Upon choosing the topics of interest, the organizers then guided the participants to approach the topic through a set of steps: by first considering related work in the area; then introducing metrics and tests that will be used for testing the validity and robustness of the solution; after metrics, the focus was on proposing specific mechanisms for the proposed approaches; and finally the last step focused on the implementation policies. At the end of the week, each group presented their progress with the hope to continue their work towards a research publication. The reports of work groups are presented in the full report.
References
- Peter A. Boncz, Yannis Chronis, Jan Finis, Stefan Halfpap, Viktor Leis, Thomas Neumann, Anisoara Nica, Caetano Sauer, Knut Stolze, and Marcin Zukowski. SPA: economical and workload-driven indexing for data analytics in the cloud. In 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023, pages 3740–3746. IEEE, 2023.
- Renata Borovica-Gajic, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski, and Campbell Fraser. Smooth scan: Statistics-oblivious access paths. In Johannes Gehrke, Wolfgang Lehner, Kyuseok Shim, Sang Kyun Cha, and Guy M. Lohman, editors, ICDE, pages 315–326. IEEE Computer Society, 2015.
- Renata Borovica-Gajic, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski, and Campbell Fraser. Smooth scan: robust access path selection without cardinality estimation. VLDB J., 27(4):521–545, 2018.
- David Justen, Daniel Ritter, Campbell Fraser, Andrew Lamb, Nga Tran, Allison Lee, Thomas Bodner, Mhd Yamen Haddad, Steffen Zeuch, Volker Markl, and Matthias Boehm. POLAR: adaptive and non-invasive join order selection via plans of least resistance. Proc. VLDB Endow., 17(6):1350–1363, 2024.
- Martin L. Kersten, Alfons Kemper, Volker Markl, Anisoara Nica, Meikel Poess, and Kai-Uwe Sattler. Tractor pulling on data warehouses. In Goetz Graefe and Kenneth Salem, editors, DBTest, page 7. ACM, 2011.
- Lukas Vogel, Daniel Ritter, Danica Porobic, Pinar Tözün, Tianzheng Wang, and Alberto Lerner. Data pipes: Declarative control over data movement. In 13th Conference on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8-11, 2023. www.cidrdb.org, 2023.
Following up on earlier Dagstuhl Seminars on robust performance of database query processing, this Dagstuhl Seminar aims to discuss and advance multiple topics in database query processing, with a new focus on cloud-computing environments. Query performance and in particular robust, predictable, reliable query performance (“good performance every time”) remains an open issue in research and in products. The problem is simple to describe, expensive to investigate and to alleviate, and well-known for decades: therefore, it must be a hard research problem that must be continuously investigated and reframed as technology develops (it is not a simple industrial development problem). A new dimension of complexity is added to this long-lasting problem with the ever-increasing popularity of cloud computing and the massive migration of database applications and services to this new environment.
As in our previous seminars, we hope to achieve mutual education as well as concrete solutions for specific hard problems, and possibly publications based on the seminar and collaboration initiated during the seminar. To guide the seminar discussion, we propose five technical topics that will potentially seed discussions in disjoint groups of participants:
- Robust query execution of complex join sequences: As a pragmatic solution to the join sequence problem in query optimization, previous instances of this seminar (in particular 22111 and 17222) investigated robust query execution techniques that are able to adaptively choose among different join sequence alternatives during runtime. In this new seminar, we aim to further investigate techniques like multiplexing plans and dynamic reoptimization to alleviate the burden of choosing “the best plan” upfront. In the context of the cloud, the solution space can be taken even further, since the elastic compute resources and disaggregated storage make it more feasible to run alternative query plans in parallel (i.e., “race plans”).
- Robust database maintenance: database system performance is often impacted by large maintenance operations that run concurrently to user queries and transactions (e.g., indexing, schema changes, physical reorganization, and the computation of samples or statistics). Related prior techniques such as adaptive indexing and log-structured merge trees alleviate this problem to some extent, but the overarching issue of providing robust performance while such large maintenance operations are in flight remains an open problem. In a cloud environment, efficient and robust maintenance operations are crucial, because they directly impact vendors’ profits: since users only pay for a given service and attached service-level agreements, well-maintained internal operation can save substantial costs for the vendor.
- Modern hardware: cloud vendors face the challenge of dealing with a heterogeneous set of hardware devices for different ends (e.g., fast SSDs and non-volatile memory, GPUs, FPGAs, and co-processing units), while offering proper abstractions to those resources with higher-level services. Recent database systems research has focused on optimizing software architectures for different hardware combinations, but we believe that little attention has been paid to how these combinations affect cloud services, or, in other words, how cloud providers should exploit and capitalize on modern hardware, e.g., as premium add-ons to existing services. As with other topics proposed in this seminar series, robustness remains an open challenge: how do we design cloud services that adapt to different hardware combinations, without suffering from performance cliffs or instabilities of service?
- Indexing for data analytics: during the previous instance of this seminar (22111), we investigated for the first time the issue of indexing in data warehouses, particularly cloud-based solutions. There are multiple new avenues of exploration to further advance this topic, including partial and adaptive indexing techniques, automatic and workload-driven creation and disposal of index structures, learned indexes, maintenance of partial replicas that accelerate certain scan predicates, new cost models for index access and maintenance, robust tuple layout reorganization, and robust caching of intermediate results. These topics all touch on the issue of how to keep a low ingest-to-query time – which is critical in data warehouses – while still being able to accelerate queries in a robust manner.
- Scheduling and workload management: The problem of delivering robust performance not for a single query in isolation, but to a set of concurrent queries or a given dynamic workload becomes especially challenging in the cloud, with its multiple layers of virtualization, disaggregated storage, service-level agreements, and different profit models (e.g., bill customers for added guarantees vs. save internal costs). There are issues of scheduling special hardware widgets (if they exist, in limited quantity, for temporary exclusive usage), of scheduling processors and memory and I/O resources, of multi-tenancy and resource-isolation among tenants, of foreground and background processing, and perhaps more. In this new seminar, we plan to investigate more robust ways of scheduling workloads and managing resources in cloud-based data services.
- Angelos Christos Anadiotis (Oracle Switzerland - Zürich, CH) [dblp]
- Manos Athanassoulis (Boston University, US) [dblp]
- Carsten Binnig (TU Darmstadt, DE) [dblp]
- Thomas Bodner (Hasso-Plattner-Institut, Universität Potsdam, DE)
- Matthias Böhm (TU Berlin, DE) [dblp]
- Peter A. Boncz (CWI - Amsterdam, NL) [dblp]
- Nicolas Bruno (Microsoft - Redmond, US) [dblp]
- Yannis Chronis (Google - Sunnyvale, US) [dblp]
- Periklis Chrysogelos (Oracle Switzerland - Zürich, CH) [dblp]
- John Cieslewicz (Google - Mountain View, US) [dblp]
- Sudipto Das (Amazon Web Services - Seattle, US)
- Thanh Do (Celonis Inc. - New York, US) [dblp]
- Kira Isabel Duwe (EPFL - Lausanne, CH)
- Jan Finis (Salesforce - München, DE)
- Campbell Fraser (Google - Mountain View, US) [dblp]
- Goetz Graefe (Google - Madison, US) [dblp]
- Stefan Halfpap (Technische Universität Berlin, DE)
- Alfons Kemper (TU München - Garching, DE) [dblp]
- Kyoungmin Kim (EPFL - Lausanne, CH)
- Andrew Lamb (InfluxData - Boston, US)
- Allison Lee (Snowflake - San Mateo, US) [dblp]
- Viktor Leis (TU München - Garching, DE) [dblp]
- Lucas Lersch (Amazon Web Services - East Palo Alto, US) [dblp]
- Boaz Leskes (MotherDuck - Amsterdam, NL)
- Thomas Neumann (TU München - Garching, DE) [dblp]
- Anisoara Nica (SAP SE - Waterloo, CA) [dblp]
- Danica Porobic (Oracle Switzerland - Zürich, CH) [dblp]
- Daniel Ritter (SAP SE - Walldorf, DE)
- Kai-Uwe Sattler (TU Ilmenau, DE) [dblp]
- Caetano Sauer (Salesforce - München, DE) [dblp]
- Bernhard Seeger (Universität Marburg, DE) [dblp]
- Knut Stolze (Ocient - Jena, DE) [dblp]
- Pinar Tözün (IT University of Copenhagen, DK) [dblp]
- Nga Tran (InfluxData - Boston, US) [dblp]
- Immanuel Trummer (Cornell University - Ithaca, US) [dblp]
- Juliane Waack (Snowflake - Berlin, DE)
- Marcin Zukowski (Snowflake - San Mateo, US) [dblp]
Verwandte Seminare
- Dagstuhl-Seminar 10381: Robust Query Processing (2010-09-19 - 2010-09-24) (Details)
- Dagstuhl-Seminar 12321: Robust Query Processing (2012-08-05 - 2012-08-10) (Details)
- Dagstuhl-Seminar 17222: Robust Performance in Database Query Processing (2017-05-28 - 2017-06-02) (Details)
- Dagstuhl-Seminar 22111: Database Indexing and Query Processing (2022-03-13 - 2022-03-18) (Details)
Klassifikation
- Databases
- Distributed / Parallel / and Cluster Computing
- Performance
Schlagworte
- databases
- cloud computing
- query processing
- indexing
- scheduling