Dagstuhl-Seminar 24122
Low-Dimensional Embeddings of High-Dimensional Data: Algorithms and Applications
( 17. Mar – 22. Mar, 2024 )
Permalink
Organisatoren
- Fred Hamprecht (Universität Heidelberg, DE)
- Dmitry Kobak (Universität Tübingen, DE)
- Smita Krishnaswamy (Yale University - New Haven, US)
- Gal Mishne (University of California, San Diego - La Jolla, US)
Kontakt
- Andreas Dolzmann (für wissenschaftliche Fragen)
- Simone Schilke (für administrative Fragen)
Dagstuhl Reports
As part of the mandatory documentation, participants are asked to submit their talk abstracts, working group results, etc. for publication in our series Dagstuhl Reports via the Dagstuhl Reports Submission System.
- Upload (Use personal credentials as created in DOOR to log in)
Dagstuhl Seminar Wiki
- Dagstuhl Seminar Wiki (Use personal credentials as created in DOOR to log in)
Gemeinsame Dokumente
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Low-dimensional embeddings are widely used for unsupervised data exploration across many scientific fields, from single-cell biology to artificial intelligence. These fields routinely deal with high-dimensional characterization of millions of objects, and the data often contain rich structure with hierarchically organised clusters, progressions, and manifolds. Researchers increasingly use 2D embeddings (t-SNE, UMAP, autoencoders, etc.) to get an intuitive understanding of their data and to generate scientific hypotheses or follow-up analysis plans. With so many scientific insights hinging on these visualisations, it becomes urgent to examine the current state of these techniques mathematically and algorithmically.
This Dagstuhl Seminar intends to bring together machine learning researchers working on algorithm development, mathematicians interested in provable guarantees, and practitioners applying embedding methods in biology, chemistry, humanities, social science, etc. Our aim is to bring together the world's leading experts to (i) survey the state of the art; (ii) identify critical shortcomings of existing methods; (iii) brainstorm ideas for the next generation of methods; and (iv) forge collaborations to help make these a reality.
This seminar should lay the groundwork for future methods that rise to the challenge of visualising high-dimensional data sets while emphasising their idiosyncrasies and scaling to tens, hundreds, and potentially thousands of millions of data points.
Seminar topics:
- Manifold assumption and manifold learning.
- Spectral methods, diffusion, Laplacian methods, etc.
- Relationships and trade-offs between different embedding algorithms.
- Limitations and shortcomings of low-dimensional embeddings. Danger of over-interpretation.
- Local, global, and hierarchical structure preservation.
- Non-Euclidean embeddings, such as hyperbolic or spherical.
- Low- (~2) vs. mid-range- (~256) dimensional embeddings: unique challenges.
- Low-dimensional embeddings in actual practice: embeddings of cells, molecules, texts, graph nodes, images, etc. Data modalities and their challenges.
- Scaling up for larger datasets, runtime considerations.
- Self-supervised embeddings via contrastive learning.
- Theoretical guarantees and mathematical properties of unsupervised and self-supervised embeddings.
- Topological data analysis in the embedding space; topological constraints on embeddings.
- Michael Bleher (Universität Heidelberg, DE)
- Kerstin Bunte (University of Groningen, NL) [dblp]
- Corinna Coupette (MPI für Informatik - Saarbrücken, DE)
- Sebastian Damrich (Universität Tübingen, DE)
- Cyril de Bodt (University of Louvain, BE)
- Alex Diaz-Papkovich (Brown University - Providence, US)
- Laleh Haghverdi (Max-Delbrück-Centrum - Berlin, DE)
- Fred Hamprecht (Universität Heidelberg, DE) [dblp]
- Ágnes Horvát (Northwestern University - Evanston, US)
- Dmitry Kobak (Universität Tübingen, DE)
- Dhruv Kohli (University of California - San Diego, US)
- Smita Krishnaswamy (Yale University - New Haven, US)
- John Aldo Lee (UC Louvain-la-Neuve, BE) [dblp]
- B.P.F. Lelieveldt (Leiden University Medical Center, NL)
- Leland McInnes (Tutte Institute for Mathematics&Computing - Ottawa, CA)
- Gal Mishne (University of California, San Diego - La Jolla, US)
- Ian Nabney (University of Bristol, GB) [dblp]
- Maximilian Noichl (Utrecht University, NL)
- Pavlin Policar (University of Ljubljana, SI)
- Bastian Rieck (Helmholtz Zentrum München, DE) [dblp]
- Enrique Fita Sanmartin (Universität Heidelberg, DE)
- Benjamin M. Schmidt (Nomic AI - New York, US)
- Ingo Scholtes (Universität Würzburg, DE) [dblp]
- Guy Wolf (University of Montreal, CA & MILA - Montreal, CA)
- Miguel Á. Carreira-Perpiñán (University of California - Merced, US) [dblp]
Klassifikation
- Data Structures and Algorithms
- Machine Learning
Schlagworte
- dimensionality reduction
- visualization
- high-dimensional