Dagstuhl-Seminar 19491
Big Graph Processing Systems
( 01. Dec – 06. Dec, 2019 )
Permalink
Organisatoren
- Angela Bonifati (University Claude Bernard - Lyon, FR)
- Alexandru Iosup (VU University Amsterdam, NL)
- Sherif Sakr (University of Tartu, EE)
- Hannes Voigt (Neo4j - Leipzig, DE)
Kontakt
- Michael Gerke (für wissenschaftliche Fragen)
- Jutka Gasiorowski (für administrative Fragen)
Impacts
- The Future is Big Graphs! : A Community View on Graph Processing Systems : article - Sakr, Sherif; Bonifati, Angela; Voigt, Hannes; Iosup, Alexandru - New York : ACM, 2021. - Pages 62-71 - (Communications of the ACM ; 64. 2021, 9).
- The Future is Big Graphs! : A Community View on Graph Processing Systems : article - 2020 - Sakr, Sherif; Bonifati, Angela; Voigt, Hannes; Iosup, Alexandru - Cornell University : arXiv.org, 2020. - 12 pp..
- Special issue on big graph data management and processing - Bonifati, Angela; Voigt, Hannes - Berlin : Springer, 2022. - pp. 202-346 - (VLDB journal ; 31. 2022, 2).
- Seraph : Continuous Queries on Property Graph Streams : article in Proceedings of the 27th International Conference on Extending Database Technology (EDBT), 25th March - 28th March, 2024 - Rost, Christopher; Tommasini, Riccardo; Bonifati, Angela; Hare, Keith W.; Selmer, Petra; Voigt, Hannes; Plantikow, Stefan; Rahm, Erhard; Valle, Emanuele Della - OpenProceedings.org, 2024. - 14 pp..
In our world, data is not just getting bigger, it is also getting more connected. Exploring, describing, predicting, and explaining phenomena connected to the interconnected world requires the use of an adequate data abstraction. Graphs are recognized as a general, natural, and flexible data-abstraction that can model complex relationships, interactions, and interdependencies between objects. Graphs have been widely used to represent datasets and encode problems across an already extensive range of application domains. The ever-increasing size of graph-structured data for these applications creates a critical need for scalable and even elastic systems that can process large amounts of it efficiently. Additionally, the complexity of using multiple datasets simultaneously in complex analysis, raises numerous challenges for graph processing, from new requirements to new capabilities.
This Dagstuhl Seminar is planning to discuss some of these open challenges on the interplay between graph data, abstractions, systems, performance engineering, and software engineering, with a main focus on the following key topics related to big graph processing systems:
- Design Decisions of Big Graph Processing Ecosystems: In modern setups, graph-processing is not a self-sustained, independent activity, but rather part of a larger big-data processing ecosystem with many system alternatives and possible design decisions. We need a clear understanding of the impact and the trade-offs of the various decisions in order to effectively guide the developers of big graph processing applications.
- High-Level Graph Processing Abstractions: While imperative programming models, such as vertex-centric or edge-centric programming models, are popular, they are lacking a high-level exposition to the end user. To increase the power of graph processing systems and foster the usage of graph analytics in applications, we need to design high-level graph processing abstractions. It is currently completely open how future declarative graph processing abstractions could look like.
- Application and Domain Specific Requirements: Due to the ubiquity of graph-shaped data, users and applications deal with these data in their daily tasks from private to professional life. It becomes thus crucial to understand the user’s requirements in executing queries and complex analytical tasks on graph data, and to understand their actual usage of these data in production environments.
- Performance and Scalability Evaluation: Traditionally, performance and scalability are measures of efficiency, e.g. FLOPS, throughput, or speedup, are difficult to apply for graph processing, especially since performance is non-trivially dependent on platform, algorithm, and dataset. Moreover, running graph-processing workloads in the cloud leverages additional challenges. Such performance-related issues are key to identify, design, and build upon widely recognized benchmarks for graph processing.
To address these topics the seminar will bring together researchers, developers, and practitioners actively working on these topics. The aim is to use the insights and the results of the discussions of the seminar to provide a roadmap that can guide the development of several aspects in the future of big graph processing systems.
Press Reviews
- Trip Report on Big Graph Processing Systems Dagstuhl Seminar
Blog entry by Juan Sequeda in Juan Sequeda's Blog, December 11, 2019
In memoriam:
This seminar is dedicated to the memory of our co-organizer and friend Sherif Sakr (1979-2020), whose unexpected early departure happened a few months after the seminar. Sherif was a leading scientist in the field of Big Data Technologies. We are grateful to him for the time spent together and the joint work preceding and following the seminar. He will be deeply missed.
The world has become more interconnected than ever. Through an advancing wave of technologies and applications, our society is producing and consuming data at an unprecedented scale and complexity. To model the data, graphs offer a general model and mathematical abstraction, in the simplest form based on arbitrary objects (vertices) connected by relationships (edges), with possibly additional information (properties). Graphs enable already a remarkable range of application domains, from industry to science, from society to governance, from education to gaming, but their true potential is just beginning to be unlocked. However, the tremendous increase in the size, complexity, and diversity of the graph-structured data and their applications, and the increasing community using graphs to understand and automate the world around us, raises new challenges for computer science. Under these new circumstances, the potential benefits of graph processing could be canceled by the difficulty to understand, create, develop, and automate graph processing for the masses. Focusing on the interplay between graph data, abstractions, systems, performance engineering, and software engineering, this seminar brings together researchers, developers, and practitioners actively working on this topic, to discuss emph{timely and relevant} open challenges with a main focus on the following topics: trade-off of design decisions of big graph processing systems, high-level graph programming abstractions and graph query languages, the specific requirements for different application domains for benchmarking and graph engineering purposes, systems and ecosystems for graph processing, the fundamental processes and methods leading to the science, design, and engineering of graph processing.
The seminar focused on the following key topics related to big graph processing systems:
Topic 1. Design Decisions of Big Graph Processing Ecosystems: In modern setups, graph-processing is not a self-sustained, independent activity, but rather part of a larger big-data processing ecosystem. Typical examples include the Giraph's deployment in the Facebook MapReduce ecosystem, Powergraph in the GraphLab machine learning and data-mining ecosystem, and GraphX in the Apache Spark ecosystem. In general, more alternatives usually mean harder decisions for choice. In practice, with the wide spectrum of big graph processing systems, with different design decisions, that are currently available, it becomes very challenging to decide by intuition which system is the most adequate for a given application requirements, workload, or the underlying ecosystem. Making such decisions requires significant knowledge about the graph complexity, graph size, world requirements, and even the implementation details of the various available systems. Currently, we lack the fundamental models to understand and quantitatively analyze, estimate, and describe the complexity of big graph processing jobs. In addition, there is no understanding on the relationship between the graph complexity and the computational complexity of big graph processing jobs. Therefore, we need a clear understanding for the impact and the trade-offs of the various decisions (e.g., centralized vs distributed, partitioning strategy, programming model, graph representation model, memory storage vs disk storage) in order to effectively guide the developers of big graph processing applications.
Topic 2. High-Level Graph Processing Abstractions: While imperative programming models, such as vertex-centric or edge-centric programming models, are popular, they are lacking a high-level exposition to the end user. This way the end user is required more technical programming, which limits the end user productivity in building graph processing pipelines. In contrast, graph query languages build on more high-level, declarative constructs. Query language abstraction give more power to the less technical user and allow for extensive performance optimization by the underlying graph processing system. Current graph query languages, however, lack the power required in many graph analytics use cases. To increase the power of graph processing systems and foster the usage of graph analytics in applications, we need to design high-level graph processing abstractions. It is currently completely open how future declarative graph processing abstractions could look like, which the best level of abstraction is, how abstraction for analytics integrate with existing graph query languages, and we can evaluate new graph processing abstractions regarding utility, simplicity, expressiveness, and optimization potential.
Topic 3. Performance and Scalability Evaluation: Traditionally, performance and scalability are measures of efficiency, contrasting the ability of systems to utilize resources: FLOPS, throughput (e.g., EVPS), or speedup (i.e., compared to either a single-node, or a sequential implementation). Such metrics are difficult to apply for graph processing, especially since performance is non-trivially dependent on platform, algorithm, and dataset (i.e., the PAD triangle). Therefore, many important questions arise: how to compare the performance of graph-processing systems?, how to define scalability?, should one compare largely different systems, e.g., a distributed, heterogeneous system with a highly-tuned, hand-written sequential implementation?, how to design a framework for reproducible performance evaluation?. Moreover, running graph-processing workloads in the cloud leverages additional challenges. First, we would like to understand whether the intrinsic cloud elasticity could be harnessed for graph processing. Second, clouds are known to be impacted by large degrees of performance variability due to colocation and virtualization overheads. Studying the impact of cloud performance variability onto graph-processing workloads is another topic of interest. Such performance-related issues are key to identify, design, and build upon widely recognized benchmarks for graph processing.
For each topic, the discussion also considered specific and general applications of graph processing, at various volume, velocity, and other dimensions.
The seminar brought together over 40 diverse and high quality researchers with core expertise from two generally distinct communities, data management and (large-scale) computer systems. The seminar was successful, and addressed in particular topics around graph processing systems: ecosystems, abstractions and other fundamental theory, and performance. To this end, we structured the seminar as follows:
- Prior to the seminar, the co-organizers have contacted each participant, eliciting commitment for one or several topics, and ideas for key elements of the discussion.
- During the first day of the seminar, the morning was dedicated to short presentations by each participant, and a long break-out session per topic. The former allowed the participants to better understand each other's core ideas and keywords, to identify synergies and to find experts for keywords not entirely familiar.
- For the next two days, each morning challenged at least one half the participants with a tutorial given by a leading expert from the other community, then proceeded with break-out sessions organized per topic, and ended with a plenary session to share the main ideas. The tutorials were given by Tamer Özsu on "Graph Processing: A Panoramic View and Some Open Problems", on behalf of the data management community, and by Antonino Tumeo on "Big Graph Processing: The System Perspective", on behalf of the systems community. The main results of these two days of intense work were making terminology more uniform across the participants, and the core ideas about challenges (open problems), directions for long-term research, and identification of concrete short-term plans for continuation.
- During the last day of the seminar, the participants finalized the immediate conclusions of the seminar (see Section "In Conclusion: Challenges and Future Directions for Big Graph Processing Systems"), and agreed on the plans for continuation.
- Khaled Ammar (Thomson Reuters Labs, CA) [dblp]
- Renzo Angles (University of Talca - Chile, CL) [dblp]
- Walid Aref (Purdue University - West Lafayette, US) [dblp]
- Marcelo Arenas (PUC - Santiago de Chile, CL) [dblp]
- Maciej Besta (ETH Zürich, CH) [dblp]
- Peter A. Boncz (CWI - Amsterdam, NL) [dblp]
- Angela Bonifati (University Claude Bernard - Lyon, FR) [dblp]
- Khuzaima Daudjee (University of Waterloo, CA) [dblp]
- Emanuele Della Valle (Polytechnic University of Milan, IT) [dblp]
- Stefania Dumbrava (ENSIIE - Evry, FR) [dblp]
- Olaf Hartig (Linköping University, SE) [dblp]
- Bernhard Haslhofer (AIT - Austrian Institute of Technology - Wien, AT) [dblp]
- Tim Hegeman (VU University Amsterdam, NL) [dblp]
- Jan Hidders (Birkbeck, University of London, GB) [dblp]
- Katja Hose (Aalborg University, DK) [dblp]
- Adriana Iamnitchi (University of South Florida - Tampa, US) [dblp]
- Alexandru Iosup (VU University Amsterdam, NL) [dblp]
- Vasiliki Kalavri (Boston University, US) [dblp]
- Hugo Kapp (Oracle Labs Switzerland - Zürich, CH)
- Wim Martens (Universität Bayreuth, DE) [dblp]
- M. Tamer Özsu (University of Waterloo, CA) [dblp]
- Eric Peukert (Universität Leipzig, DE) [dblp]
- Stefan Plantikow (Neo4j - Berlin, DE) [dblp]
- Mohamed Ragab (University of Tartu, EE) [dblp]
- Matei Ripeanu (University of British Columbia - Vancouver, CA) [dblp]
- Sherif Sakr (University of Tartu, EE) [dblp]
- Semih Salihoglu (University of Waterloo, CA) [dblp]
- Christian Schulz (Universität Wien, AT) [dblp]
- Petra Selmer (Neo4j - London, GB) [dblp]
- Juan F. Sequeda (data.world - Austin, US) [dblp]
- Joshua Shinavier (Uber Engineering - Palo Alto, US) [dblp]
- Gábor Szárnyas (Budapest Univ. of Technology & Economics, HU) [dblp]
- Riccardo Tommasini (Polytechnic University of Milan, IT)
- Antonino Tumeo (Pacific Northwest National Lab. - Richland, US) [dblp]
- Alexandru Uta (VU University Amsterdam, NL) [dblp]
- Ana Lucia Varbanescu (University of Amsterdam, NL) [dblp]
- Hannes Voigt (Neo4j - Leipzig, DE) [dblp]
- Hsiang-Yun Wu (TU Wien, AT) [dblp]
- Nikolay Yakovets (TU Eindhoven, NL) [dblp]
- Da Yan (The University of Alabama - Birmingham, US) [dblp]
- Eiko Yoneki (University of Cambridge, GB) [dblp]
Verwandte Seminare
- Dagstuhl-Seminar 25171: Holistic Graph-Processing Systems: Enabling Real-World Scale and Societal Impact (2025-04-21 - 2025-04-25) (Details)
Klassifikation
- data bases / information retrieval
- world wide web / internet
Schlagworte
- Big Data
- Big Graphs
- Graph Processing Systems