TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 25182

Challenges and Opportunities of Table Representation Learning

( Apr 27 – May 02, 2025 )

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/25182

Organizers

Contact

Motivation

The increasing amount of data being collected, stored, and analyzed induces a need for efficient, scalable, and robust methods to handle this data. Representation learning, i.e. the practice of leveraging neural networks to obtain generic representations of data objects, has been shown effective for various applications over data modalities such as images and text. More recently, representation learning has shown initial impressive capabilities on structured data (e.g. relational tables in databases), for a limited set of tasks in data management and analysis, such as data cleaning, insight retrieval, and data analytics. Most applications traditionally relied on heuristics and statistics, which are limited in robustness, scale, and accuracy. The ability to learn abstract representations across tables unlocked new opportunities, such as pretrained models for data augmentation and machine learning, that address these limitations. This emerging research area, which we refer to as Table Representation Learning (TRL), receives increasing interest from industry as well as academia , in particular in the communities of data management, machine learning, and natural language processing.

This growing interest is a result of the high potential impact of TRL in industry given the abundance of tables in the organizational data landscape, the large range of high-value applications relying on tables, and the early state of TRL research so far. That is, recently, specialized TRL models for embedding (relational) tables as well as prompting methods for LLMs over structured data residing in databases have been developed and shown effective for various tasks, e.g. data preparation, machine learning, and question answering. However, studies have revealed shortcomings of existing models regarding their ability to capture the structure of tables , the relationships among tables, the heterogeneity (e.g. numbers, dates, text), biases and semantics of the data contents, limited generalization to new domains, unaddressed privacy constraints, etc. These challenges are merely the first limitations surfaced so far and we expect to identify more limitations of existing approaches through discussions, talks, and hands-on sessions at the TRL Dagstuhl Seminar.

As we stand at the starting point of developing and adopting high-capacity neural models (e.g. through representation or generative learning) for structured data, there is a wide range of applications that have not been addressed yet. For example, pretrained models for tabular machine learning have been explored only to a limited extent, whereas “upstream” data management applications, such as automated data validation and query and schema optimization, have not been explored so far. Therefore, another objective of this Dagstuhl Seminar is to identify novel application areas, build first prototypes to assess the potential, and develop research agendas towards further exploration of these applications. Moreover, beyond these unexplored applications, we aim to develop a manifesto that brings forward a common long-term vision for TRL with moon-shot ideas and the road to get there, which requires perspectives from experts in academia and industry.

Copyright Carsten Binnig, Julian Martin Eisenschlos, Madelon Hulsebos, and Frank Hutter

Classification
  • Artificial Intelligence
  • Computation and Language
  • Databases

Keywords
  • Representation and Generative Learning for Data Management and Analysis
  • Applications of Table Representation Learning
  • Benchmarks and Datasets for Table Representation Learning
  • Pre-trained (Language) Models for Tables and Databases