TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 23031

Frontiers of Information Access Experimentation for Research and Education

( Jan 15 – Jan 20, 2023 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/23031

Organizers

Contact



Schedule

Summary

Information access – which includes Information Retrieval (IR), Recommender Systems (RS), and Natural Language Processing (NLP) – has a long tradition of relying heavily on experimental evaluation, dating back to the mid-1950s, a tradition that has driven the research and evolution of the field. However, nowadays, research and development of information access systems are confronted with new challenges: information access systems are called to support a much wider set of user tasks (informational, educational, and entertainment, just to name a few) which are increasingly challenging, and as a result, research settings and available opportunities have evolved substantially (e.g., better platforms, richer data, but also developments within the scientific culture) and shape the way in which we do research and experimentation. Consequently, it is critical that the next generation of scientists is equipped with a portfolio of evaluation methods that reflect the field’s challenges and opportunities, and help ensure internal validity (e.g., measures, statistical analyses, effect sizes, etc., to support establishing a trustworthy cause-effect relationship between treatments and outcomes), construct validity (e.g., measuring the right thing rather than a partial proxy), and external validity (e.g., critically assessing to which extent findings hold in other situations, domains, and user groups). A robust portfolio of such methods will contribute to developing more responsible experimental practices.

Therefore, we face two problems: Can we re-innovate how we do research and experimentation in the field by addressing emerging challenges in experimental processes to develop the next generation of information access systems? How can a new paradigm of experimentation be leveraged to improve education to give an adequate basis to the new generation of researchers and developers?

This Dagstuhl Seminar brought together experts from various sub-fields of information access, namely IR, RS, NLP, information science, and human-computer interaction to create a joint understanding of the problems and challenges presented above, to discuss existing solutions and impediments, and to propose next steps to be pursued in the area.

To stimulate thinking around these themes, prior to the seminar, we challenged participants with the following questions:

  • Which experimentation methodologies are most promising to further develop and create a culture around?
  • In which ways can we consider the concerns related to Fairness, Accountability, and Transparency (FAccT) in the experimentation practices? How can we establish FaccT-E, i.e. FaccT in Experimentation?
  • How can industry and academia better work together on experimentation?
  • How can critical experimentation methods and skills be taught and developed in academic teaching?
  • How can we foster collaboration and run shared infrastructures enabling collaborative and joint experimentation? How to organize shared evaluation activities taking advantage of new hybrid forms of participation?

We started the seminar week with a series of long and short talks delivered by participants, also in response to the above questions. This helped in setting a common ground and understanding and in letting emerge the topics and themes that participants wished to explore as the main output of the seminar.

This led to the definition of five groups which explored challenges, opportunities, and next steps in the following areas

  • Reality check: The working group identified the main challenges in doing real-world studies in RS and IR research – and points to best practices and remaining challenges in both how to do domain-specific or longitudinal studies, how to recruit the right participants, using existing or creating new infrastructure including appropriate data representation, as well as how, why and what to measure.
  • Human-machine-collaborative relevance judgment frameworks: The working group studied the motivation for using Large Language Models (LLMs) to automatically generate relevance assessments in information retrieval evaluation, and raises research questions about how LLMs can help human assessors with the assessment task, whether machines can replace humans in assessing and annotating, and what are the conditions under which human assessors cannot be replaced by machines.
  • Overcoming methodological challenges in IR and RS through awareness and education: Given the potential limitations of today’s predominant experimentation practices, we find that we need to better equip the various actors in the scientific ecosystem in terms of scientific methods, and we identify a corresponding set of helpful resources and initiatives, which will allow them to adopt a more holistic perspective when evaluating such systems.
  • Results-blind reviewing: The current review processes lead to undue emphasis on performance, rejecting papers focusing on insights in case they show no performance improvements. We propose to introduce a results-blind reviewing process forcing reviewers to put more emphasis on the theoretical background, the hypotheses, the methodological plan and the analysis plan of an experiment, thus improving the overall quality of the papers being accepted.
  • Guidance for authors: The Information Retrieval community has over time developed expectations regarding papers, but these expectations are largely implicit. In contrast to adjacent disciplines, efforts in the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) community have been rather sparse and are mostly due to individuals expressing their own views. Drawing on materials from other disciplines, we have built a draft set of guidelines with the aim of them being understandable, broad, and highly concise. We believe that our proposal is general and uncontroversial, can be used by the main venues, and can be maintained with an open and continuous effort driven by, and for, the community.
Copyright Christine Bauer, Ben Carterette, Nicola Ferro, and Norbert Fuhr

Motivation

This Dagstuhl Seminar will address technology-enhanced information access (information retrieval, recommender systems, etc.) and specifically focus on developing more responsible experimental practices leading to more valid results, both for research as well as scientific education.

Information access has a long tradition of relying heavily on experimental evaluation dating back to the mid 1950s, this tradition has driven the research and evolution of the field. However, nowadays, research and development of information access systems are confronted with new challenges: information access systems are called to support a much wider set of user tasks (informational, educational, entertainment, just to name a few) which are increasingly challenging, and as a result, research settings and available opportunities have evolved substantially (e.g., better platforms, richer data, but also developments within the scientific culture) and shape the way in which we do research and experimentation. Consequently, it is critical that the next generation of scientists is equipped with a portfolio of evaluation methods that reflect the field’s challenges and opportunities, and help ensure internal validity (e.g., how many measures, statistical analyses, effect sizes, etc., to support establishing a trustworthy cause-effect relationship between treatments and outcomes), construct validity (e.g., measuring the right thing rather than a partial proxy), and external validity (e.g., critically assessing to which extent results hold in other situations, domains, and user groups). A robust portfolio of such methods will contribute to developing more responsible experimental practices.

Therefore, we face two problems: Can we re-innovate how we do research and experimentation in the field by addressing emerging challenges in experimental processes to develop the next generation of information access systems? How can a new paradigm of experimentation be leveraged to improve education to give an adequate basis to the new generation of researchers and developers?

To cope with these problems, in this Dagstuhl Seminar we aim at addressing specifically the following questions – among others:

  • Which experimentation methodologies are most promising to further develop and create a culture around?
  • In which ways can we consider the concerns related to FAccT (Fairness, Accountability, and Transparency) in the experimentation practices? How can we establish FaccT-E, i.e. FaccT in Experimentation?
  • How can industry and academia better work together on experimentation?
  • How can critical experimentation methods and skills be taught and developed in academic teaching?
  • How can we foster collaboration and run shared infrastructures enabling collaborative and joint experimentation? How to organize shared evaluation activities taking the opportunity of new hybrid forms of participation?
Copyright Christine Bauer, Ben Carterette, Nicola Ferro, and Norbert Fuhr

Participants
  • Christine Bauer (Utrecht University, NL) [dblp]
  • Joeran Beel (Universität Siegen, DE)
  • Timo Breuer (TH Köln, DE) [dblp]
  • Charles Clarke (University of Waterloo, CA) [dblp]
  • Anita Crescenzi (University of North Carolina - Chapel Hill, US)
  • Gianluca Demartini (The University of Queensland - Brisbane, AU) [dblp]
  • Giorgio Maria Di Nunzio (University of Padova, IT) [dblp]
  • Laura Dietz (University of New Hampshire - Durham, US) [dblp]
  • Guglielmo Faggioli (University of Padova, IT) [dblp]
  • Nicola Ferro (University of Padova, IT) [dblp]
  • Bruce Ferwerda (Jönköping University, SE) [dblp]
  • Maik Fröbe (Friedrich-Schiller-Universität Jena, DE)
  • Norbert Fuhr (Universität Duisburg-Essen, DE) [dblp]
  • Matthias Hagen (Friedrich-Schiller-Universität Jena, DE) [dblp]
  • Allan Hanbury (TU Wien, AT)
  • Claudia Hauff (Spotify - Amsterdam, NL) [dblp]
  • Dietmar Jannach (Alpen-Adria-Universität Klagenfurt, AT) [dblp]
  • Noriko Kando (National Institute of Informatics - Tokyo, JP) [dblp]
  • Evangelos Kanoulas (University of Amsterdam, NL) [dblp]
  • Bart Knijnenburg (Clemson University, US) [dblp]
  • Udo Kruschwitz (Universität Regensburg, DE) [dblp]
  • Birger Larsen (Aalborg University Copenhagen, DK) [dblp]
  • Meijie Li (Universität Duisburg-Essen, DE)
  • Maria Maistro (University of Copenhagen, DK) [dblp]
  • Lien Michiels (University of Antwerp, BE) [dblp]
  • Andrea Papenmeier (Universität Duisburg-Essen, DE) [dblp]
  • Martin Potthast (Universität Leipzig, DE) [dblp]
  • Paolo Rosso (Technical University of Valencia, ES) [dblp]
  • Alan Said (University of Gothenburg, SE) [dblp]
  • Philipp Schaer (TH Köln, DE) [dblp]
  • Christin Seifert (Universität Duisburg-Essen, DE)
  • Ian Soboroff (NIST - Gaithersburg, US) [dblp]
  • Damiano Spina (RMIT University - Melbourne, AU)
  • Benno Stein (Bauhaus-Universität Weimar, DE) [dblp]
  • Nava Tintarev (Maastricht University, NL) [dblp]
  • Julián Urbano (TU Delft, NL) [dblp]
  • Henning Wachsmuth (Universität Paderborn, DE) [dblp]
  • Martijn Willemsen (Eindhoven University of Technology & JADS - ‘s-Hertogenbosch- Eindhoven) [dblp]
  • Justin Zobel (The University of Melbourne, AU) [dblp]

Classification
  • Artificial Intelligence
  • Information Retrieval

Keywords
  • Information Access Systems
  • Experimentation
  • Evaluation
  • User Interaction
  • Simulation