Dagstuhl-Seminar 18251
Database Architectures for Modern Hardware
( 17. Jun – 22. Jun, 2018 )
Permalink
Organisatoren
- Peter A. Boncz (CWI - Amsterdam, NL)
- Goetz Graefe (Google - Madison, US)
- Bingsheng He (National University of Singapore, SG)
- Kai-Uwe Sattler (TU Ilmenau, DE)
Kontakt
- Michael Gerke (für wissenschaftliche Fragen)
- Annette Beyer (für administrative Fragen)
Impacts
- DPI : The Data Processing Interface for Modern Networks : article in CIDR’19, January 13 - 16, 2019, Asilomar, California - Alonso, Gustavo; Binnig, Carsten; Pandis, Ippokratis; Salem, Kenneth; Skrzypczak, Jan; Stutsman, Ryan; Thostrup, Lasse; Wang, Zeke; Ziegler, Tobias; Wang, Tianzhen; - New York : ACM, 2019. - 7 pp..
- Joins on high-bandwidth memory: a new level in the memory hierarchy : article - Pohl, Constantin; Sattler, Kai-Uwe; Graefe, Goetz - Berlin : Springer, 2019. - pp. 1-21 - (VLDB journal ; 2019).
- Waves of Misery After Index Creation : article in BTW2019 : Datenbanksysteme für Business, Technologie und Web : pp. 77-96 - Glombiewski, Nikolaus; Seeger, Bernhard; Graefe, Goetz - Bonn : Gesellschaft für Informatik e.V., 2019 - (Lecture notes in informatics / P ; 289 : article).
Over the last years, the social and commercial relevance of efficient data management has led to the development of database systems as foundation of almost all complex software systems. Hence there is a wide acceptance of architectural patterns for database systems which are based on assumptions on classic hardware setups. However, the currently used database concepts and systems are not well prepared to support emerging application domains such as eSciences, Internet of Things or Digital Humanities. From a user's perspective, flexible domain-specific query languages or at least access interfaces are required, novel data models for these application domains have to be integrated, and consistency guarantees which reduce flexibility and performance should be adaptable according to the requirements. Finally, volume, variety, veracity as well as velocity of data caused by ubiquitous sensors have to be mastered by massive scalability and online processing by providing traditional qualities of database systems like consistency, isolation and descriptive query languages. At the same time, current and future hardware trends provide new opportunities such as many-core CPUs with hundreds of compute cores, special-purpose computing units such as GPUs and FPGAs, novel storage technologies such as non-volatile memory and advanced solid state devices, as well as high-speed networks based on 10 Gbit/s Ethernet or InfiniBand supporting already direct access to memory of a remote node. Moreover, heterogeneous hardware designs such as coupled CPU-FPGA and CPU-GPU architectures represent a trend of close integration between classic hardware and emerging hardware.
Thus, the goal of this Dagstuhl Seminar is to bring together researchers and practitioners from these areas representing both the software and hardware sides and therefore different disciplines to foster cross-cutting architectural discussions. During the seminar, the participants discuss opportunities and challenges in order to exploit features of modern hardware and operating system primitives for data processing as well as to support and accelerate modern database processing by hardware technology. Having a dialogue between the different disciplines allows for a push-and-pull principle: Database researchers may learn about research opportunities of current developments of emerging hardware as well as may propose requirements to lower-level hardware and software components. The seminar extends the series of previous Dagstuhl seminars on database systems aspects, such as "Robust Query Processing" (10381, 12321, 17222) as well as "Databases on Future Hardware" (17101).
The seminar will start with a focus on specific problems of hardware accelerated database processing and on prior results. Thereafter, seminar participants will work in small groups on selected cross-cutting problems. Possible topics are specific query processing techniques for FPGA, GPU, and many-core processors, exploiting new memory and storage technologies for indexing and recovery, hardware support for concurrency control and transaction management or the impact of high-speed networks on distributed database architectures. Every day, interleaved plenary presentations and discussions will re-focus the working groups. Towards the end of the week, we may have concrete ideas for new techniques and perhaps for publications. Both academic and industrial participants may freely use the discussion contents and results for follow-on work.
Over the last years, the social and commercial relevance of efficient data management has led to the development of database systems as foundation of almost all complex software systems. Hence there is a wide acceptance of architectural patterns for database systems which are based on assumptions on classic hardware setups. However, the currently used database concepts and systems are not well prepared to support emerging application domains such as eSciences, Internet of Things or Digital Humanities. From a user's perspective, flexible domain-specific query languages or at least access interfaces are required, novel data models for these application domains have to be integrated, and consistency guarantees which reduce flexibility and performance should be adaptable according to the requirements. Finally, volume, variety, veracity as well as velocity of data caused by ubiquitous sensors have to be mastered by massive scalability and online processing by providing traditional qualities of database systems like consistency, isolation and descriptive query languages. At the same time, current and future hardware trends provide new opportunities such as:
- many-core CPUs: Next-generation CPUs will provide hundreds of compute cores already in the commodity range. In order to allow high degrees of parallelism some architectures already provide hardware support for the necessary synchronization, e.g. transactional memory. However, it is not clear yet how to fully utilize these degrees of parallelism and synchronization mechanism for database processing.
- co-processors like GPU and FPGA: Special-purpose computing units such as GPUs and FPGAs allow for parallelism at much higher degrees accelerating compute-intensive tasks significantly. Moreover, heterogeneous hardware designs such as coupled CPU-FPGA and CPU-GPU architectures represent a trend of close integration between classic hardware and emerging hardware. However, such designs require new architectural concepts for data management.
- novel storage technologies like NVRAM and SSD: Even modern in-memory database system solutions rely mostly on block-based media (e.g. SSD and HDD) for ensuring persistence of data. Emerging memory technologies such as non-volatile memory (NVRAM) promise byte-addressable persistence with latencies close to DRAM. Currently, the usage of this technology is discussed for instant failure recovery of databases, but the role of NVRAM in future data management system architectures is still open.
- high-speed networks: Both in scale-up and scale-out scenarios efficient interconnects play a crucial role. Today, high-speed networks based on 10 Gbit/s Ethernet or InfiniBand support already Remote DMA, i.e. direct access to memory of a remote node. However, this requires to deal with distributed systems properties (unreliability, locality) and it is still unclear how database systems can utilize this mechanism.
In order to open up the exemplarily mentioned application domains together with exploiting the potential of future hardware generations it becomes necessary now to fundamentally rethink current database architectures.
One of the main challenges of this rethinking is that it requires expertise from different research disciplines: hardware design, computer architectures, networking, operating systems, distributed systems, software engineering, and database systems.
Thus, the goal of this Dagstuhl Seminar was to bring together researchers and practitioners from these areas representing both the software and hardware sides and therefore different disciplines to foster cross-cutting architectural discussions. In this way, the seminar extended the series of previous Dagstuhl seminars on database systems aspects, such as "Robust Query Processing" (10381, 12321, 17222) as well as "Databases on Future Hardware" (17101).
The seminar was organized into six working groups where the participants discussed opportunities and challenges in order to exploit different features of modern hardware and operating system primitives for data processing:
- Database accelerators: Based on an analysis of use cases for database accelerators from the level of individual operators and algorithms up to the level of complex database tasks, the group discussed ways of exploiting and evaluating accelerator technologies as well as future research directions with respect to hardware acceleration in databases.
- Memory hierarchies: The group discussed design recipes for database nodes with non-trival memory hierarchies containing not only disk and RAM but also non-volatile memory. Within such a hierarchy different caching strategies are employed: exclusive caching for functionally equivalent levels and inclusive caching for levels with different functionality.
- Remote direct memory access: The group discussed ways of exploiting RDMA in data-intensive applications. Particularly, an interface providing a set of useful abstractions for network-aware data-intensive processing called DPI was proposed. Similar to MPI, DPI is designed as an interface that can have multiple implementations for different networking technologies to enable the exploitation of RDMA and in-network processing.
- Heterogeneous database architectures: This topic was addressed by two working groups. Both groups discussed a database software architecture that is capable of making use of multiple hardware devices (GPU, TPU, FPGA, ASICs), in addition to the CPU for handling database workloads. The principle goal was an architecture that would never be worse than a state-of-the-art CPU-centered database architecture, but would get significant benefit on those workloads were the heterogeneous devices can exploit their strengths. The first group developed a morsel-driven architecture, where pipelines are broken up into sub-pipelines and adaptive execution strategies are exploited. The second group discussed operating system support and primitives for heterogeneous architectures.
- Machine learning in database systems: The goal of this working group was to investigate the application of machine learning methods for estimating operator selectivities as part of query optimization. Such an approach could overcome the inaccuracies of traditional cost estimation techniques especially for queries comprised of complex predicates and multiple joins.
The progress and outcome of the individual working groups was presented in a daily plenary session, details of the results are given below.
References
- Gustavo Alonso, Michaela Blott, Jens Teubner: Databases on Future Hardware (Dagstuhl Seminar 17101). Dagstuhl Reports 7(3):1–18 (2017)
- Renata Borovica-Gajic, Goetz Graefe, Allison Lee: Robust Performance in Database Query Processing (Dagstuhl Seminar 17222). Dagstuhl Reports 7(5):169–180 (2017)
- Goetz Graefe, Wey Guy, Harumi A. Kuno, Glenn N. Paulley: Robust Query Processing (Dagstuhl Seminar 12321). Dagstuhl Reports 2(8):1–15 (2012)
- Goetz Graefe, Arnd Christian König, Harumi Anne Kuno, Volker Markl, Kai-Uwe Sattler: Robust Query Processing. Dagstuhl Seminar Proceedings 10381, Schloss Dagstuhl – Leibniz- Zentrum für Informatik, Germany 2010
- Anastasia Ailamaki (EPFL - Lausanne, CH) [dblp]
- Gustavo Alonso (ETH Zürich, CH) [dblp]
- Witold Andrzejewski (Poznan University of Technology, PL) [dblp]
- Carsten Binnig (TU Darmstadt, DE) [dblp]
- Peter A. Boncz (CWI - Amsterdam, NL) [dblp]
- Philippe Bonnet (IT University of Copenhagen, DK) [dblp]
- Sebastian Breß (DFKI - Berlin, DE) [dblp]
- Holger Fröning (Universität Heidelberg - Mannheim, DE) [dblp]
- Goetz Graefe (Google - Madison, US) [dblp]
- Bingsheng He (National University of Singapore, SG) [dblp]
- Alfons Kemper (TU München, DE) [dblp]
- Thomas Leich (HS Harz - Wernigerode, DE) [dblp]
- Viktor Leis (TU München, DE) [dblp]
- Daniel Lemire (University of Québec - Montreal, CA) [dblp]
- Justin Levandoski (Amazon Web Services - Seattle, US) [dblp]
- Stefan Manegold (CWI - Amsterdam, NL) [dblp]
- Klaus Meyer-Wegener (Universität Erlangen-Nürnberg, DE) [dblp]
- Onur Mutlu (ETH Zürich, CH) [dblp]
- Thomas Neumann (TU München, DE) [dblp]
- Anisoara Nica (SAP SE - Waterloo, CA) [dblp]
- Ippokratis Pandis (Amazon Web Services - Palo Alto, US) [dblp]
- Andrew Pavlo (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Thilo Pionteck (Universität Magdeburg, DE) [dblp]
- Holger Pirk (MIT - Cambridge, US) [dblp]
- Danica Porobic (Oracle Labs - Redwood Shores, US) [dblp]
- Gunter Saake (Universität Magdeburg, DE) [dblp]
- Ken Salem (University of Waterloo, CA) [dblp]
- Kai-Uwe Sattler (TU Ilmenau, DE) [dblp]
- Caetano Sauer (Tableau - München, DE) [dblp]
- Bernhard Seeger (Universität Marburg, DE) [dblp]
- Evangelia Sitaridi (Amazon.com, Inc. - Palo Alto, US) [dblp]
- Jan Skrzypczak (Zuse Institute Berlin, DE) [dblp]
- Olaf Spinczyk (TU Dortmund, DE) [dblp]
- Ryan Stutsman (University of Utah - Salt Lake City, US) [dblp]
- Jürgen Teich (Universität Erlangen-Nürnberg, DE) [dblp]
- Tianzheng Wang (Simon Fraser University - Burnaby, CA) [dblp]
- Zeke Wang (ETH Zürich, CH) [dblp]
- Marcin Zukowski (Snowflake Computing Inc. - San Mateo, US) [dblp]
Klassifikation
- data bases / information retrieval
- data structures / algorithms / complexity
- hardware
Schlagworte
- Database systems
- Computer Architecture
- Hardware Support for Databases
- Co-Processors
- Non-Volatile Memory