Dagstuhl Seminar 24172
Code Search
( Apr 21 – Apr 24, 2024 )
Permalink
Organizers
- Satish Chandra (Google - Mountain View, US)
- Michael Pradel (Universität Stuttgart, DE)
- Kathryn T. Stolee (North Carolina State University - Raleigh, US)
Contact
- Marsha Kleinbauer (for scientific matters)
- Simone Schilke (for administrative matters)
Shared Documents
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Schedule
Code search describes the process of retrieving source code from a repository, where that source code matches a query. Whether a developer is looking for where an error was thrown, learning how to use a new-to-them API, learning a new programming language, or browsing their team’s directory to familiarize themselves with the codebase, search underpins all these activities. Beyond those human-driven software engineering processes, search is also a component in automated software engineering, such as automated program repair, code example recommendation, and clone detection. Furthermore, new generative AI tools have challenged traditional code search by presenting alternative approaches to finding and reusing code.
Code search research has implications for developer productivity, code quality, and software engineering ethics, and tools to facilitate code search are widely available. Some are internal to companies (e.g. Google has invested substantially in this), others are open source (e.g. Github has a search interface for public repositories), while still others generate code to match a user query (e.g., ChatGPT). Students and professionals use generic web search to find source code examples as well. With each of these platforms, query formats vary, indexing varies, rankings vary, the origin of the code varies, and use cases vary. This provides many avenues for innovation and exploration in code search research.
For example, what is the appropriate scope for a search result? This question has implications for the underlying technology (e.g., should the indexed unit be a file, function, sub-function, or something else?) and for the use case (e.g., does the user want to adapt the code to their context? Are they seeking to understand a code base? Or something else?). There are many other questions worth exploring: How should source code be indexed? Which search results should appear first? Are there artifacts beyond the code itself that should be surfaced, such as diffs against previous versions or documentation? What diversity of results should be shown to the user? What are the ethical considerations with code search, and with code search vs. code generation?
This Dagstuhl Seminar brings together experts in mining software repositories, human factors in software engineering, software documentation, code examples, program analysis, and industrial code search systems to bridge the gap between industry and academia and set the roadmap for the next decade of code search research.
Expected outcomes of this seminar include: new ideas on how to better support developers in searching for code across different user segments (e.g., industrial, open source software, student populations, developers with low language familiarity), clarity on how search can help during different stages of software development (e.g., writing new code, debugging existing issues, reviewing code), a better understanding of code search ethics, and guidelines for more rigorous, repeatable evaluations for code search research.
- Boris Bokowski (Google - München, DE)
- José Cambronero (Microsoft - Redmond, US)
- Satish Chandra (Google - Mountain View, US) [dblp]
- Jürgen Cito (TU Wien, AT) [dblp]
- Luca Di Grazia (Universität Stuttgart, DE)
- Elena Leah Glassman (Harvard University - Allston, US) [dblp]
- Georgios Gousios (TU Delft, NL) [dblp]
- Reid Holmes (University of British Columbia - Vancouver, CA) [dblp]
- Ciera Jaspan (Google - Mountain View, US) [dblp]
- Tobias Kiecker (HU Berlin, DE)
- Dongsun Kim (Kyungpook National University, KR) [dblp]
- Miryung Kim (University of California at Los Angeles, USA & Amazon Web Services - Palo Alto, USA) [dblp]
- Jens Krinke (University College London, GB) [dblp]
- Julia Lawall (INRIA - Paris, FR) [dblp]
- Gabriel Matute (University of California - Berkeley, US)
- Alexander Neubeck (GitHub - San Francisco, US)
- Michael Pradel (Universität Stuttgart, DE) [dblp]
- Nikitha Rao (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Kathryn T. Stolee (North Carolina State University - Raleigh, US) [dblp]
- Christoph Treude (The University of Melbourne, AU) [dblp]
- Jan Van den Bussche (Hasselt University, BE) [dblp]
- Rijnard van Tonder (Mysten Labs - Palo Alto, US)
- Bogdan Vasilescu (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Cristina Videira Lopes (University of California - Irvine, US) [dblp]
- Tobias Welp (Google - München, DE)
- Bowen Xu (North Carolina State University - Raleigh, US)
- Svetlana Zemlyanskaya (JetBrains GmbH - München, DE)
Classification
- Software Engineering
Keywords
- code search
- developer tools