Dagstuhl-Seminar 22132
Graph Embeddings: Theory Meets Practice
( 27. Mar – 30. Mar, 2022 )
Permalink
Organisatoren
- Martin Grohe (RWTH Aachen University, DE)
- Stephan Günnemann (TU München, DE)
- Stefanie Jegelka (MIT - Cambridge, US)
- Christopher Morris (McGill University & MILA - Montreal)
Kontakt
- Andreas Dolzmann (für wissenschaftliche Fragen)
- Christina Schwarz (für administrative Fragen)
Programm
Graph-structured data is ubiquitous across application domains ranging from chemo- and bioinformatics to image and social network analysis. To develop successful machine learning algorithms or apply standard data analysis tools in these domains, we need techniques that map the rich information inherent in the graph structure to a vectorial representation in a meaningful way-so-called graph embeddings. Designing such embeddings comes with unique challenges. The embedding has to account for the complex structure of (real-world) networks and additional high-dimensional continuous vectors attached to nodes and edges in a (permutation) invariant way while being scalable to massive graphs or sets of graphs. Moreover, when used in supervised machine learning, the model trained with such embeddings must generalize well to new or previously unseen (graph) instances. Hence, more abstractly, designing graph embeddings results in a trade-off between expressivity, scalability, and generalization.
Starting from the 1960s in chemoinformatics, different research communities have worked in the area under various guises, often leading to recurring ideas. Moreover, triggered by the resurgence of (deep) neural networks, there is an ongoing trend in the machine learning community to design invariant/equivariant neural architectures that are capable of dealing with graph- and relational input, both (semi-)supervised and unsupervised, often denoted as graph neural networks. Although successful in practical settings, most of these developments are driven by intuition and empiricism and are geared towards specific application areas. There is no clear understanding of these approaches' limitations and their trade-offs in complexity, expressivity, and generalization. Researchers recently started to leverage connections to graph theory, group theory, logic, combinatorial algorithms, and (algorithmic) learning theory, leading to new theoretical insights and triggering new research in applications. Hence, in this seminar, we aimed to bring together leading applied and theoretical researchers in graph embeddings and adjacent areas, such as graph isomorphism, bio- and chemoinformatics, graph theory, to facilitate an increased exchange of ideas between these communities. Concretely, we aimed to understand what hinders recent theoretical developments being applied in application areas and worked towards a more practical theory. Further, we aimed at understanding the overarching challenges across applications and challenges inherent to specific areas to stimulate directions for further practical and theoretical research.
The seminar bought together 33 researchers from (applied) mathematics, specifically harmonic analysis and (algebraic) topology, (theoretical) computer science, machine learning, bioinformatics, and network science. Eighteen researchers attended remotely owing to the global COVID-19 pandemic. In total, the participants presented 18 talks on their recent progress in a better understanding of graph embeddings, focusing on supervised machine learning, particularly graph neural networks. Many talks dealt with leveraging tools from graph isomorphism testing and related areas such as finite model theory and group theory. In particular, the Weisfeiler-Leman algorithm, a popular heuristic for the graph isomorphism problem, was used to measure the expressivity of the presented algorithms and neural architectures. The consensus was that the above algorithm leads to a too coarse-grained measure of expressivity, and new notions of expressivity are needed to develop a thorough understanding. Surprisingly, only a few talks dealt with developing a better understanding of generalization, indicating that the research community still lacks an understanding. Notably, Gitta Kutyniok showed how to leverage random graph models and graphons to analyze the generalization error of graph neural networks, while Bruno Ribeiro talked about the connection between causality and out-of-distribution generalization. Further, some talks used methods from (algebraic) topology and their connection to graph theory to devise provably expressive architectures and to better understand common problems with graph neural networks, e.g., the problem of ``over-smoothing'' of node representations faced when considering deep architectures. Moreover, two talks covered the challenges of applying graph neural networks to biomedical data and industrial applications at Google, respectively, indicating a gap between theoretical results and practical architectures.
Concluding Remarks
The seminar was well received, as witnessed by several positive comments from on-site participants. In general, there was an exciting atmosphere at the seminar, particularly among the large number of junior researchers attending the seminar on-site, also witnessed by many lively discussions during on-site talks. However, this was not always the case during online talks, and the active participation of online participants was relatively low. Finally, the organizers wish to express their gratitude to the Scientific Directors of Schloss Dagstuhl – Leibniz Center for Informatics for their support of the seminar.
Graph-structured data is ubiquitous across application domains ranging from chemo- and bioinformatics to image and social network analysis. To develop successful machine learning algorithms or apply standard data analysis tools in these domains, we need techniques that map the rich information inherent in the graph structure to a vectorial representation in a meaningful way—so-called graph embeddings. Designing such embeddings comes with unique challenges. The embedding has to account for the complex structure of (real-world) networks and additional high-dimensional continuous vectors attached to nodes and edges in a (permutation) invariant way while being scalable to massive graphs or sets of graphs. Moreover, when used in supervised machine learning, the model trained with such embeddings must generalize well to new or previously unseen (graph) instances. Hence, more abstractly, designing graph embeddings results in a trade-off between expressivity, scalability, and generalization.
Starting from the 1960s in chemoinformatics, different research communities have worked in the area under various guises, often leading to recurring ideas. Moreover, triggered by the resurgence of (deep) neural networks, there is an on-going trend in the machine learning community to design invariant/equivariant neural architectures that are capable of dealing with graph- and relational input, both (semi-)supervised and unsupervised, often denoted as graph neural networks.
Although successful in practical settings, most of these developments are driven by intuition and empiricism and are geared towards specific application areas. There is no clear understanding of these approaches' limitations and their trade-offs in complexity, expressivity, and generalization. Researchers recently started to leverage connections to graph theory, group theory, logic, combinatorial algorithms, and (algorithmic) learning theory, leading to new theoretical insights and triggering new research in applications. Hence, we aim to bring together leading applied and theoretical researchers in graph embeddings and adjacent areas, such as graph isomorphism, bio- and chemoinformatics, graph theory, to facilitate an increased exchange of ideas between these communities. Concretely, we aim to understand what hinders recent theoretical developments being applied in application areas and work towards a more practical theory. Further, we aim at understanding the overarching challenges across applications and challenges inherent to specific areas to stimulate directions for further practical and theoretical research.
- Francesco Di Giovanni (Twitter - San Francisco, US)
- Federico Errica (NEC Laboratories Europe - Heidelberg, DE)
- Fabrizio Frasca (Twitter - London & Imperial College London)
- Floris Geerts (University of Antwerp, BE) [dblp]
- Martin Grohe (RWTH Aachen University, DE) [dblp]
- Stephan Günnemann (TU München, DE) [dblp]
- Nils Kriege (Universität Wien, AT) [dblp]
- Gitta Kutyniok (LMU München, DE) [dblp]
- Haggai Maron (NVIDIA - Tel Aviv, IL)
- Christopher Morris (McGill University & MILA - Montreal) [dblp]
- Gaurav Rattan (RWTH Aachen, DE) [dblp]
- Bruno Ribeiro (Purdue University - West Lafayette, US)
- Bastian Rieck (Helmholtz Zentrum München, DE) [dblp]
- Pascal Schweitzer (TU Darmstadt, DE) [dblp]
- Jan Tönshoff (RWTH Aachen, DE)
- Pablo Barcelo (PUC - Santiago de Chile, CL) [dblp]
- Aleksandar Bojchevski (CISPA - Saarbrücken, DE)
- Joan Bruna Estrach (New York University, US) [dblp]
- Tina Eliassi-Rad (Northeastern University - Boston, US) [dblp]
- Matthias Fey (TU Dortmund, DE)
- Barbara Hammer (Universität Bielefeld, DE) [dblp]
- Stefanie Jegelka (MIT - Cambridge, US) [dblp]
- Elias Khalil (University of Toronto, CA) [dblp]
- Benny Kimelfeld (Technion - Haifa, IL) [dblp]
- Yaron Lipman (Weizmann Institute - Rehovot, IL) [dblp]
- Andreas Loukas (Prescient Design - Schlieren, CH)
- Bryan Perozzi (Google - New York, US) [dblp]
- Siamak Ravanbakhsh (McGill University - Montréal, CA)
- Joshua Robinson (MIT - Cambridge, US)
- Yizhou Sun (UCLA, US)
- Petar Velickovic (DeepMind - London, GB)
- Ulrike von Luxburg (Universität Tübingen, DE) [dblp]
- Marinka Zitnik (Harvard University - Boston, US) [dblp]
Klassifikation
- Discrete Mathematics
- Machine Learning
- Neural and Evolutionary Computing
Schlagworte
- machine learning for graphs
- graph embedding