Dagstuhl Seminar 25182: Challenges and Opportunities of Table Representation Learning

Dagstuhl Seminar 25182

Challenges and Opportunities of Table Representation Learning

( Apr 27 – May 02, 2025 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/25182

Organizers

Carsten Binnig (TU Darmstadt, DE)
Julian Martin Eisenschlos (Google Research - Zürich, CH)
Madelon Hulsebos (CWI - Amsterdam, NL)
Frank Hutter (Prior Labs - Freiburg, DE & ELLIS Institute Tübingen, DE & Universität Freiburg, DE)

Contact

Michael Gerke (for scientific matters)
Susanne Bach-Bernhard (for administrative matters)

Shared Documents

Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)

Impacts

SemBench : A Benchmark for Semantic Query Processing Engines - Lao, Jiale; Zimmerer, Andreas; Ovcharenko, Olga; Özcan, Fatma; Cochez, Michael; Trummer, Immanuel; Kipf, Andreas; Schelter, Sebastian; Jagadish, H. V.; Kissel, KrisCong, Tianji; Russo, Matthew; Vitagliano, Gerardo; Gupta, Gautam; Hottelier, Thibaud; - Cornell University : arXiv.org, 2025. - 14 pp..

Schedule

Schedule

Motivation

Show Motivation

The increasing amount of data being collected, stored, and analyzed induces a need for efficient, scalable, and robust methods to handle this data. Representation learning, i.e. the practice of leveraging neural networks to obtain generic representations of data objects, has been shown effective for various applications over data modalities such as images and text. More recently, representation learning has shown initial impressive capabilities on structured data (e.g. relational tables in databases), for a limited set of tasks in data management and analysis, such as data cleaning, insight retrieval, and data analytics. Most applications traditionally relied on heuristics and statistics, which are limited in robustness, scale, and accuracy. The ability to learn abstract representations across tables unlocked new opportunities, such as pretrained models for data augmentation and machine learning, that address these limitations. This emerging research area, which we refer to as Table Representation Learning (TRL), receives increasing interest from industry as well as academia , in particular in the communities of data management, machine learning, and natural language processing.

This growing interest is a result of the high potential impact of TRL in industry given the abundance of tables in the organizational data landscape, the large range of high-value applications relying on tables, and the early state of TRL research so far. That is, recently, specialized TRL models for embedding (relational) tables as well as prompting methods for LLMs over structured data residing in databases have been developed and shown effective for various tasks, e.g. data preparation, machine learning, and question answering. However, studies have revealed shortcomings of existing models regarding their ability to capture the structure of tables , the relationships among tables, the heterogeneity (e.g. numbers, dates, text), biases and semantics of the data contents, limited generalization to new domains, unaddressed privacy constraints, etc. These challenges are merely the first limitations surfaced so far and we expect to identify more limitations of existing approaches through discussions, talks, and hands-on sessions at the TRL Dagstuhl Seminar.

As we stand at the starting point of developing and adopting high-capacity neural models (e.g. through representation or generative learning) for structured data, there is a wide range of applications that have not been addressed yet. For example, pretrained models for tabular machine learning have been explored only to a limited extent, whereas “upstream” data management applications, such as automated data validation and query and schema optimization, have not been explored so far. Therefore, another objective of this Dagstuhl Seminar is to identify novel application areas, build first prototypes to assess the potential, and develop research agendas towards further exploration of these applications. Moreover, beyond these unexplored applications, we aim to develop a manifesto that brings forward a common long-term vision for TRL with moon-shot ideas and the road to get there, which requires perspectives from experts in academia and industry.

Creative Commons BY 4.0

Carsten Binnig, Julian Martin Eisenschlos, Madelon Hulsebos, and Frank Hutter

Participants

Show Participants

Please log in to DOOR to see more details.

Carsten Binnig (TU Darmstadt, DE) [dblp]
Vadim Borisov (tabularis.ai - Tübingen, DE) [dblp]
Shuaichen Chang (Amazon Web Services - New York, US) [dblp]
Michael Cochez (VU Amsterdam, NL) [dblp]
Tianji Cong (University of Michigan - Ann Arbor, US) [dblp]
Katharina Eggensperger (Universität Tübingen, DE) [dblp]
Julian Martin Eisenschlos (Google Research - Zürich, CH) [dblp]
Floris Geerts (University of Antwerp, BE) [dblp]
Filip Gralinski (Snowflake - Warsaw, PL)
Madelon Hulsebos (CWI - Amsterdam, NL) [dblp]
Frank Hutter (Prior Labs - Freiburg, DE & ELLIS Institute Tübingen, DE & Universität Freiburg, DE) [dblp]
Myung Jun Kim (INRIA Saclay - Île-de-France, FR)
Andreas Kipf (TU Nürnberg, DE)
Tassilo Klein (SAP SE - Walldorf, DE)
Xue Li (CWI - Amsterdam, NL)
Andreas Müller (Microsoft Corp. - Mountain View, US) [dblp]
Olga Ovcharenko (TU Berlin, DE)
Fatma Özcan (Google - San Jose, US) [dblp]
Paolo Papotti (EURECOM - Biot, FR) [dblp]
Lennart Purucker (Universität Freiburg, DE) [dblp]
Anupam Sanghi (TU Darmstadt, DE) [dblp]
Sebastian Schelter (TU Berlin, DE) [dblp]
Shivam Sharma (TU Darmstadt, DE)
Immanuel Trummer (Cornell University - Ithaca, US) [dblp]
Gaël Varoquaux (INRIA Saclay - Île-de-France, FR) [dblp]
Gerardo Vitagliano (MIT - Cambridge, US)
Liane Vogel (TU Darmstadt, DE) [dblp]

Classification

Artificial Intelligence
Computation and Language
Databases

Keywords

Representation and Generative Learning for Data Management and Analysis
Applications of Table Representation Learning
Benchmarks and Datasets for Table Representation Learning
Pre-trained (Language) Models for Tables and Databases

Seminar 25182

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 25182

Challenges and Opportunities of Table Representation Learning

( Apr 27 – May 02, 2025 )

Permalink

Organizers

Contact

Shared Documents

Impacts

Schedule

Motivation

Participants

Classification

Keywords