TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 26161

Managing Vector Data for Retrieval Augmented Generation: Systems and Algorithms

( Apr 12 – Apr 17, 2026 )

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/26161

Organizers

Contact

Motivation

There is a surge of dense, high-dimensional, billion-scale vector data generated by deep learning models that embed complex, multi-modal data, including text, multimedia, graphs, and tables into vector representations aiming to preserve semantically meaningful information, for several downstream tasks, e.g., question answering, recommendation, video search, drug design, and other data science applications. Vector DataBases (VectorDBs) are optimized specifically for the storage and management of high dimensional vectors. Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model (LLM), so it references an authoritative knowledge base outside of its training data sources before generating a response. RAG and VectorDBs are two important concepts in natural language processing (NLP) and multi-modal data management that are pushing the boundaries of what AI systems can achieve. A critical aspect that powers the capabilities of RAG models is the vector database which stores the embeddings for fast semantic search during the initial retrieval stage. For RAG models to scale to a huge corpus containing billions of text passages and multi-modal knowledge graphs, effective and efficient model fine-tuning, indexing, and querying of vector representations are crucial. This is where highly optimized vector databases, e.g., Weaviate, Chroma, FAISS, Vespa, or Pinecone come into play. They allow storing billions of entity or document vectors for low-latency similarity search.

More generally, the use of VectorDBs to power RAG addresses emerging critical problems such as how to generate vector data effectively fusing multi-modal information; when geometry of the data preserves semantic information; how to update them dynamically; efficient storage, indexing, visualization, scalable querying, and explanation; preserving privacy and fairness. This Dagstuhl Seminar aims to bring together researchers from the emerging areas of RAG, VectorDBs, systems, and applications – providing opportunities for interdisciplinary progress. We plan to have a traditional mix of invited (and thus well-prepared) presentations, both from academia and industry, as well as breakout sessions, a panel, a demo and/or poster session, a gong show session having 5-minute talks for participants who would like to showcase their relevant and ongoing research works, visionary ideas, etc., thereby initiating more discussion and cross-disciplinary collaboration.

Topics (non-exhaustive):

  • Vector data generation, geometry, dynamic update
  • Retrieval-augmented generation
  • Store and index vector data for RAG
  • Vector databases for knowledge modeling and cross-modal data retrieval
  • Query optimization in vector databases
  • Software-hardware collaborative approaches and cloud data management for vector data
  • Access control, privacy, fairness, data regulations, adding human-in-the-loop, and explainability in vector data management
  • Applications of vector data, RAG, and LLMs
Copyright Sihem Amer-Yahia, Arijit Khan, Wolfgang Lehner, Sharad Mehrotra, and Wenjie Zhang

Classification
  • Databases

Keywords
  • Vector databases
  • Vector data management
  • Data management and Machine Learning
  • Retrieval Augment Generation
  • Similarity search