TOP
Suche auf der Schloss Dagstuhl Webseite
Sie suchen nach Informationen auf den Webseiten der einzelnen Seminare? - Dann:
Nicht fündig geworden? - Einige unserer Dienste laufen auf separaten Webseiten mit jeweils eigener Suche. Bitte beachten Sie folgende Liste:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminare
Innerhalb dieser Seite:
Externe Seiten:
  • DOOR (zum Registrieren eines Dagstuhl Aufenthaltes)
  • DOSA (zum Beantragen künftiger Dagstuhl Seminare oder Dagstuhl Perspektiven Workshops)
Publishing
Innerhalb dieser Seite:
Externe Seiten:
dblp
Innerhalb dieser Seite:
Externe Seiten:
  • die Informatik-Bibliographiedatenbank dblp


Dagstuhl-Seminar 24211

Evaluation Perspectives of Recommender Systems: Driving Research and Education

( 20. May – 24. May, 2024 )

(zum Vergrößern in der Bildmitte klicken)

Permalink
Bitte benutzen Sie folgende Kurz-Url zum Verlinken dieser Seite: https://www.dagstuhl.de/24211

Organisatoren

Kontakt

Gemeinsame Dokumente


Programm

Summary

Recommender systems (RS) have become essential tools in everyday life, efficiently helping users discover relevant, useful, and interesting items such as music tracks, movies, or social matches. RS identify the interests and preferences of individual users through explicit input or implicit information inferred from their interactions with the systems and tailor content and recommendations accordingly [13, 16].

Evaluation of RS requires attention at every phase of the system life cycle, including design, development, and continuous improvement during operation. High-quality evaluation is crucial for a system’s success in practice. This evaluation can focus on the core performance of the system or encompass the entire context in which it is used [3, 7, 8, 10]. Research typically differentiates between system-centric and user-centric evaluation. System-centric evaluation examines algorithmic aspects, such as the predictive accuracy of recommender algorithms. In contrast, user-centric evaluation assesses the user’s perspective, including perceived quality and user experience. Comprehensive evaluation must address both aspects since high predictive accuracy does not necessarily meet user expectations [12].

The topic of evaluation, with all its challenges, is currently very relevant and trending. The PERSPECTIVES workshops (organized at ACM RecSys 2021-2023 [14, 15, 11], coorganized by this seminar’s organizers) were highly popular and attracted many participants. This interest is further evidenced by the special issue in ACM Transactions on Recommender Systems [1] on evaluation. Recent calls for more impactful RS research [5, 6, 12, 9] highlight that current evaluation practices are too narrow and may not be practically relevant. [4] advocate for more nuanced evaluation methods that meet industry demands. [9] argue that current practices are insufficient as they often overlook side effects or longitudinal impacts. A recent systematic literature study further reveals that current evaluation methods are limited in experiment design, dataset choice, and evaluation metrics [2].

This seminar on evaluation perspectives of RS brought together researchers and practitioners from diverse backgrounds. It aimed to discuss current challenges and advance the ongoing discussion on RS evaluation. The seminar began with eight presentations addressing current challenges in evaluation. These talks initiated the general discussion and helped form groups around these topics. As a result, five working groups were established, each focusing on the following areas:

Working Group 1: Theory of Evaluation

This group focused on the theoretical foundations of RS evaluation. They began by identifying the shortcomings of current evaluation practices and linking these issues to underlying theoretical principles. Key challenges discussed included the selection and configuration of evaluation metrics and the reporting of evaluation results. Section 4.1 of the full report outlines the challenges and theoretical perspectives identified in this group.

Working Group 2: Fairness Evaluation

This group focused on exploring paradigms and practices for evaluating the fairness of RS. Given the specific nature of fairness metrics and evaluation requirements for different applications, fairness problems, and goals, the group proposed “best meta-practices”, a set of approaches to planning, executing, and communicating rigorous fairness evaluation scenarios. The group’s outcome is documented in Section 4.2 of the full report.

Working Group 3: Best-Practices for Offline Evaluations of Recommender Systems

This working group addressed the topic of offline evaluation, with a specific focus on identifying problems and best practices for this evaluation method. They concentrated on pinpointing the primary challenges related to reproducibility and methodology. Subsequently, they provided guidelines to address these challenges from various perspectives, including those of paper authors, reviewers, editors, and program chairs, as summarized in Section 4.3 of the full report.

Working Group 4: Multistakeholder and Multimethod Evaluation

This group examined the challenges and complexities in evaluating multistakeholder scenarios, discussing the key aspects that must be considered in such a nuanced environment. Additionally, they explored the transition from theoretical evaluation frameworks to practical implementation. Section 4.4 of the full report outlines this work.

Working Group 5: Evaluating the Long-Term Impact of Recommender Systems

This working group concentrated on the long-term perspective and impact of RS and their evaluation. This includes developing suitable long-term measures and conducting social and behavioral research to understand and facilitate aspects such as human behavior, long-term stakeholder goals, and corresponding metrics. Additionally, the group examined practical challenges when evaluating the long-term aspects and impact of RS. This work is presented in Section 4.5 of the full report

References

  1. Christine Bauer, Alan Said, and Eva Zangerle. Introduction to the special issue on perspectives on recommender systems evaluation. ACM Transactions on Recommender Systems, 2 (1), mar 2024. URL https://doi.org/10.1145/3648398.
  2. Christine Bauer, Eva Zangerle, and Alan Said. Exploring the landscape of recommender systems evaluation: Practices and perspectives. ACM Transactions on Recommender Systems, 2(1), mar 2024. URL https://doi.org/10.1145/3629170.
  3. Joeran Beel, Stefan Langer, Marcel Genzmehr, Bela Gipp, Corinna Breitinger, and Andreas Nürnberger. Research paper recommender system evaluation: a quantitative literature survey. In Proceedings of the international workshop on reproducibility and replication in recommender systems evaluation, pages 15–22, 2013.
  4. Patrick John Chia, Jacopo Tagliabue, Federico Bianchi, Chloe He, and Brian Ko. Beyond ndcg: behavioral testing of recommender systems with reclist. In Companion Proceedings of the Web Conference 2022, pages 99–104, 2022.
  5. Dietmar Jannach and Christine Bauer. Escaping the McNamara Fallacy: Towards more impactful recommender systems research. AI Magazine, 41(4):79–95, December 2020. ISSN 2371-9621, 0738-4602.
  6. Paolo Cremonesi and Dietmar Jannach. Progress in recommender systems research: Crisis? what crisis? AI Magazine, 42(3):43–54, 2021.
  7. Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5–53, jan 2004. ISSN 1046-8188. https://doi.org/10.1145/963770.963772.
  8. Dietmar Jannach, Oren Sar Shalom, and Joseph A Konstan. Towards more impactful recommender systems research. In ImpactRS@ RecSys, 2019.
  9. Gourab K Patro, Lorenzo Porcaro, Laura Mitchell, Qiuyue Zhang, Meike Zehlike, and Nikhil Garg. Fair ranking: a critical review, challenges, and future directions. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency, pages 1929–1942, 2022.
  10. Alan Said, Domonkos Tikk, Klara Stumpf, Yue Shi, Martha A Larson, and Paolo Cremonesi. Recommender systems evaluation: A 3d benchmark. In RUE@ RecSys, pages 21–23, 2012.
  11. Alan Said, Eva Zangerle, and Christine Bauer, editors. Third Workshop: Perspectives on the Evaluation of Recommender Systems (PERSPECTIVES 2023), RecSys ’23, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400702419. URL https://doi.org/10.1145/3604915.3608748.
  12. Eva Zangerle and Christine Bauer. Evaluating recommender systems: survey and framework. ACM Computing Surveys, 55(8):1–38, 2022.
  13. Bo Xiao and Izak Benbasat. E-commerce product recommendation agents: Use, characteristics, and impact. MIS quarterly, pages 137–209, 2007.
  14. Eva Zangerle, Christine Bauer, and Alan Said, editors. Perspectives on the Evaluation of Recommender Systems (PERSPECTIVES), RecSys ’21, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450384582. URL https://doi.org/10. 1145/3460231.3470929.
  15. Eva Zangerle, Christine Bauer, and Alan Said, editors. Second Workshop: Perspectives on the Evaluation of Recommender Systems (PERSPECTIVES 2022), RecSys ’22, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392785. URL https://doi.org/10.1145/3523227.3547408.
  16. Francesco Ricci, Lior Rokach, and Bracha Shapira. Recommender Systems Handbook. Springer New York, NY, 3rd edition, 2022.
Copyright Christine Bauer, Alan Said, and Eva Zangerle

Motivation

Evaluation is an important cornerstone in the process of researching, developing, and deploying recommender systems. This Dagstuhl Seminar aims to shed light on the different and potentially diverging or contradictory perspectives on the evaluation of recommender systems. Building on the discussions and outcomes of the PERSPECTIVES workshop series held at ACM RecSys 2021-2023, the seminar will bring together academia and industry to critically reflect on the state of the evaluation of recommender systems and create a setting for development and growth.

While recommender systems is largely an applied field, their evaluation builds on and intersects theories from information retrieval, machine learning, and human-computer interaction. Historically, the theories and evaluation approaches in these fields are very different. Thoroughly evaluating recommender systems requires integrating all perspectives. Hence, this seminar will bring together experts from these fields and serve as a vehicle for discussing and developing the state-of-the-art and practice of evaluating recommender systems. The seminar will set the ground for developing recommender systems evaluation metrics, methods, and practices through collaborations and discussions between participants from diverse backgrounds, e.g., academic and industry researchers, industry practitioners, senior and junior. We emphasize the importance of getting and keeping the big picture of a recommender system’s performance in its context of use, for which it is ultimate to incorporate the technical and the human element.

We will set the basis for the next generation of researchers, apt to evaluate and advance recommender systems thoroughly

Copyright Christine Bauer, Alan Said, and Eva Zangerle

Teilnehmer

Please log in to DOOR to see more details.

  • Gediminas Adomavicius (University of Minnesota - Minneapolis, US) [dblp]
  • Vito Walter Anelli (Politecnico di Bari, IT)
  • Andrea Barraza-Urbina (Grubhub - New York, US)
  • Christine Bauer (Paris Lodron Universität Salzburg, AT) [dblp]
  • Joeran Beel (Universität Siegen, DE)
  • Alejandro Bellogín (Autonomous University of Madrid, ES)
  • Toine Bogers (IT University of Copenhagen, DK) [dblp]
  • Peter Brusilovsky (University of Pittsburgh, US) [dblp]
  • Robin Burke (University of Colorado - Boulder, US)
  • Wanling Cai (Trinity College - Dublin, IE & Lero, the Science Foundation Ireland - Limerick, IE )
  • Tommaso Di Noia (Politecnico di Bari, IT) [dblp]
  • Michael D. Ekstrand (Drexel University - Philadelphia, US) [dblp]
  • Kim Falk (Copenhagen, DK)
  • Andres Ferraro (Pandora, US)
  • Bart Goethals (University of Antwerp, BE)
  • Neil Hurley (University College Dublin, IE)
  • Dietmar Jannach (Alpen-Adria-Universität Klagenfurt, AT) [dblp]
  • Olivier Jeunen (ShareChat - London, GB)
  • Joseph Konstan (University of Minnesota - Minneapolis, US) [dblp]
  • Dominik Kowald (Know Center - Graz, AT & TU Graz, AT) [dblp]
  • Maria Maistro (University of Copenhagen, DK) [dblp]
  • Lien Michiels (University of Antwerp, BE) [dblp]
  • Julia Neidhardt (TU Wien, AT)
  • Özlem Özgöbek (NTNU - Trondheim, NO)
  • Denis Parra (PUC - Santiago de Chile, CL)
  • Sole Pera (TU Delft, NL) [dblp]
  • Lorenzo Porcaro (EC Joint Research Centre - Ispra, IT)
  • Alan Said (University of Gothenburg, SE) [dblp]
  • Rodrygo Santos (Federal University of Minas Gerais-Belo Horizonte, BR)
  • Guy Shani (Ben Gurion University - Beer Sheva, IL) [dblp]
  • Manel Slokom (TU Delft, NL)
  • Annelien Smets (Vrije Universiteit Brussel, BE)
  • Barry Smyth (University College Dublin, IE) [dblp]
  • Marko Tkalcic (University of Primorska, SI)
  • Helma Torkamaan (TU Delft, NL)
  • Alexander Tuzhilin (New York University, US) [dblp]
  • Tobias Vente (Universität Siegen, DE)
  • Robin Verachtert (DPG Media - Antwerpen, BE)
  • Lukas Wegmeth (Universität Siegen, DE)
  • Martijn Willemsen (TU Eindhoven, NL) [dblp]
  • Jürgen Ziegler (Universität Duisburg-Essen, DE) [dblp]

Klassifikation
  • Human-Computer Interaction
  • Information Retrieval
  • Machine Learning

Schlagworte
  • Recommender Systems
  • Evaluation
  • Information Retrieval
  • User Interaction
  • Intelligent Systems