Dagstuhl Seminar 26161: Managing Vector Data for Retrieval Augmented Generation: Systems and Algorithms

Dagstuhl Seminar 26161

Managing Vector Data for Retrieval Augmented Generation: Systems and Algorithms

( Apr 12 – Apr 17, 2026 )

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/26161

Organizers

Sihem Amer-Yahia (University of Grenoble, FR)
Arijit Khan (Aalborg University, DK)
Wolfgang Lehner (TU Dresden, DE)
Sharad Mehrotra (University of California - Irvine, US)
Wenjie Zhang (UNSW - Sydney, AU)

Contact

Andreas Dolzmann (for scientific matters)
Simone Schilke (for administrative matters)

Motivation

Show Motivation

There is a surge of dense, high-dimensional, billion-scale vector data generated by deep learning models that embed complex, multi-modal data, including text, multimedia, graphs, and tables into vector representations aiming to preserve semantically meaningful information, for several downstream tasks, e.g., question answering, recommendation, video search, drug design, and other data science applications. Vector DataBases (VectorDBs) are optimized specifically for the storage and management of high dimensional vectors. Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model (LLM), so it references an authoritative knowledge base outside of its training data sources before generating a response. RAG and VectorDBs are two important concepts in natural language processing (NLP) and multi-modal data management that are pushing the boundaries of what AI systems can achieve. A critical aspect that powers the capabilities of RAG models is the vector database which stores the embeddings for fast semantic search during the initial retrieval stage. For RAG models to scale to a huge corpus containing billions of text passages and multi-modal knowledge graphs, effective and efficient model fine-tuning, indexing, and querying of vector representations are crucial. This is where highly optimized vector databases, e.g., Weaviate, Chroma, FAISS, Vespa, or Pinecone come into play. They allow storing billions of entity or document vectors for low-latency similarity search.

More generally, the use of VectorDBs to power RAG addresses emerging critical problems such as how to generate vector data effectively fusing multi-modal information; when geometry of the data preserves semantic information; how to update them dynamically; efficient storage, indexing, visualization, scalable querying, and explanation; preserving privacy and fairness. This Dagstuhl Seminar aims to bring together researchers from the emerging areas of RAG, VectorDBs, systems, and applications – providing opportunities for interdisciplinary progress. We plan to have a traditional mix of invited (and thus well-prepared) presentations, both from academia and industry, as well as breakout sessions, a panel, a demo and/or poster session, a gong show session having 5-minute talks for participants who would like to showcase their relevant and ongoing research works, visionary ideas, etc., thereby initiating more discussion and cross-disciplinary collaboration.

Topics (non-exhaustive):

Vector data generation, geometry, dynamic update
Retrieval-augmented generation
Store and index vector data for RAG
Vector databases for knowledge modeling and cross-modal data retrieval
Query optimization in vector databases
Software-hardware collaborative approaches and cloud data management for vector data
Access control, privacy, fairness, data regulations, adding human-in-the-loop, and explainability in vector data management
Applications of vector data, RAG, and LLMs

Creative Commons BY 4.0

Sihem Amer-Yahia, Arijit Khan, Wolfgang Lehner, Sharad Mehrotra, and Wenjie Zhang

Classification

Databases

Keywords

Vector databases
Vector data management
Data management and Machine Learning
Retrieval Augment Generation
Similarity search

Seminar 26161

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 26161

Managing Vector Data for Retrieval Augmented Generation: Systems and Algorithms

( Apr 12 – Apr 17, 2026 )

Permalink

Organizers

Contact

Motivation

Classification

Keywords