Dagstuhl Seminar 24052
Reviewer No. 2: Old and New Problems in Peer Review
( Jan 28 – Feb 02, 2024 )
Permalink
Organizers
- Iryna Gurevych (TU Darmstadt, DE)
- Anna Rogers (IT University of Copenhagen, DK)
- Nihar Shah (Carnegie Mellon University - Pittsburgh, US)
Contact
- Christopher Michels (for scientific matters)
- Christina Schwarz (for administrative matters)
Shared Documents
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Schedule
Background
Peer review is the best mechanism for assessing scientific validity of new research that we have so far. But this mechanism has many well-known issues, such as the different incentives of the authors and reviewers, difficulties with preserving reviewer and author anonymity to avoid social biases [1, 2, 3, 4, 5], confirmation and other cognitive biases [6, 7, 8, 9, 10], that even researchers fall prey to. These intrinsic problems are exacerbated in interdisciplinary fields like Natural Language Processing (NLP), where groups of researchers may vary so much in their methodology, terminology, and research agendas, that sometimes they have trouble even recognizing each other’s contributions as “research” [11].
Our Dagstuhl Seminar covered a range of topics related to organization of peer review in NLP, Machine Learning (ML), and venues more broadly in Artificial Intelligence for intelligent support of peer-reviewing, including the following:
- Improving the paper-reviewer matching by processes/algorithms that take into account both topic matches and reviewer interest in a given research question.
- Peer review vs methodological and demographic diversity in the field.
- Better practices for designing review forms and peer review policies.
- Improving the structural incentives for reviewers.
- Use of NLP and ML for intelligent peer reviewing support: increasing the quality and efficiency of peer review, opportunities and challenges.
- Peer-reviewing and research integrity.
Goals
We intended for the seminar to serve as a point of reflection on decades of personal experience of the participants in organizing different kinds of peer-reviewed venues in NLP and beyond, enabling an in-depth discussion of what has been tried, what seems to work and what doesn't. The objectives of the seminar included collaborative research on the methodological challenges of peer review, NLP and ML for intelligent support of peer-reviewing and actionable proposals, for example for paper-reviewer assignment policies and peer reviewing guidelines and workflows, informed by the experience of participants as chairs, editors, conference organizers, and reviewers.
Outcomes
The seminar was attended by researchers at different levels of seniority and from a variety of research backgrounds. While a large number of the attendees represented the Natural Language Processing community, about a third represented other communities within the broader sphere of Machine Learning. Most discussions focused on the peer review in the world of ultra-large conferences with thousands of submissions, but we also had a senior representative from fields where journals are most prominent, and hence an opportunity to learn from their experience.
Knowledge Sharing
The seminar started by contributed talks by a diverse group of participants (see Section 3), which allowed us to share relevant experience and research findings pertinent to the topics of the seminar, across communities. Peer review issues are at most discussed in the business meetings of specific conferences, and there are hardly any opportunities to share this knowledge across communities. Hence, this knowledge-sharing section of the seminar by itself has been unique, and it proved to be useful to establish a common ground and points of reference for subsequent work during the seminar.
Problem elucidation
After the contributed talks, all the subsequent work was organized into breakout sessions (two running in parallel) on the following topics:
- Integrity issues in peer review (2 sessions)
- Diversity issues in peer review (3 sessions)
- Assisting peer review with NLP (3 sessions)
- Peer review policies (2 sessions)
- Incentives in peer review (3 sessions)
- Paper-reviewer matching (3 sessions)
The work in all these sessions combined brainstorming, establishing common ground and terms, discussing practical solutions for specific problems that were tried in various communities represented by the participants, and ideas for the future. Summaries of work in all the above topics are provided in Section 4 of the full report.
There were also two slots reserved for unstructured breakouts, and every day concluded with an overall summary session in which the leads for various topics summarized the discussions in that day.
Research program and community formation
The key outcome of the seminar is a white paper with the working title “What Can NLP do for Peer Review?”, co-authored by the majority of the participants of the seminar. It formulates the goals and research agenda of assisting peer review with NLP techniques, and we hope that it would play a key role in shaping this research field. This paper is available at [12]. It is accompanied by a repository for tracking research papers in this area, available at https://github.com/OAfzal/nlp-for-peer-review.
Concrete policies
The work in various breakout sessions culminated in the proposal of a new peer review committee for the Association of Computational Linguistics (ACL), that would oversee the systematic research and data-driven peer review policy development in the NLP community. This proposal has already been formally submitted to the ACL board, and generally approved. The work on formally establishing and announcing the committee will be finished in 2024.
Research problems and collaborations
This Dagstuhl Seminar also helped surface and crystallize a number of open problems, and alongside, helped establish inter-disciplinary collaborations for working on them, which may not have happened if not for this seminar.
Next steps
This Dagstuhl Seminar brought together an international, community of NLP and ML researchers from academia and industry to discuss the problems with peer review in large-scale conferences. This is a topic for which various subcommunities have different practices, expectations, and strong opinions, and the seminar brought much discussion throughout all days of the seminar (and also long into the night). This was also a unique opportunity to share the lessons learned the hard way, on issues which are often misconstrued as merely organizational issues. In fact, this is something to be seriously discussed as a research problem, for which much conceptual and empirical work is needed.
We hope that this seminar was the first in a series of events devoted to this topic, and that this inaugural event proves pivotal in the formation of a cohesive research community. The white paper prepared as the main outcome of this seminar aims to galvanize the NLP and ML communities by offering them a wide selection of realistic research problems with peer review as an application area.
References
- Jürgen Huber, Sabiou Inoua, Rudolf Kerschbamer, Christian König-Kersting, Stefan Palan, and Vernon L. Smith. Nobel and novice: Author prominence affects peer review. Proceedings of the National Academy of Sciences, 119(41):e2205779119, October 2022.
- Inna Smirnova, Daniel M. Romero, and Misha Teplitskiy. The bias-reducing effect of voluntary anonymization of authors’ identities: Evidence from peer review, January 2023.
- Andrew Tomkins, Min Zhang, and William D. Heavlin. Reviewer bias in single- versus doubleblind peer review. Proceedings of the National Academy of Sciences, 114(48):12708–12713, 2017.
- Charvi Rastogi, Ivan Stelmakh, Xinwei Shen, Marina Meila, Federico Echenique, Shuchi Chawla, and Nihar Shah. To ArXiv or not to ArXiv: A study quantifying pros and cons of posting preprints online. arXiv preprint arXiv:2203.17259, 2022.
- Emaad Manzoor and Nihar B Shah. Uncovering latent biases in text: Method and application to peer review. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 4767–4775, 2021.
- Jian Wang, Reinhilde Veugelers, and Paula Stephan. Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Research Policy, 46(8):1416–1436, October 2017.
- J. A. Garcia, Rosa Rodriguez-Sánchez, and J. Fdez-Valdivia. Confirmatory bias in peer review. Scientometrics, 123(1):517–533, April 2020.
- David M. Allen and James W. Howell, editors. Groupthink in Science: Greed, Pathological Altruism, Ideology, Competition, and Culture. Springer International Publishing, Cham, 2020.
- Carole J Lee. Commensuration bias in peer review. Philosophy of Science, 82(5):1272–1283, 2015.
- Ivan Stelmakh, Nihar B Shah, Aarti Singh, and Hal Daumé III. Prior and prejudice: The novice reviewers’ bias against resubmissions in conference peer review. volume 5, pages 1–17. ACM New York, NY, USA, 2021.
- Anna Rogers and Isabelle Augenstein. What can we do to improve peer review in NLP? In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1256–1262. Association for Computational Linguistics, November 2020.
- Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, Jingyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych. What Can Natural Language Processing Do for Peer Review?”, CoRR, Vol. abs/2405.06563, 2024.
Peer review is the best mechanism for assessing scientific validity of new research that we have so far. But this mechanism has many well-known issues, such as the different incentives of the authors and reviewers, difficulties with preserving reviewer and author anonymity, confirmation and other cognitive biases that even researchers fall prey to. These intrinsic problems are exacerbated in interdisciplinary fields like Natural Language Processing (NLP) and Machine Learning (ML), where groups of researchers may vary so much in their methodology, terminology, and research agendas, that sometimes they have trouble even recognizing each other's contributions as "research".
This Dagstuhl Seminar will cover a range of topics related to organization of peer review in NLP and ML, including the following:
- Improving the paper-reviewer matching by processes/algorithms that take into account both topic matches and reviewer interest in a given research question
- Peer review vs methodological and demographic diversity in the interdisciplinary fields
- Better practices for designing peer-review policies
- Improving the structural incentives for reviewers
- Use of NLP and ML for suitable automation of (parts of) the paper reviewing process
- Peer-reviewing and research integrity
The seminar will serve as a point of reflection on decades of personal experience of the participants in organizing different kinds of peer-reviewed venues, enabling an in-depth discussion of what has been tried, what seems to work and what doesn't. It will also incorporate the fast-improving capabilities of NLP/ML systems. The outcomes of the seminar may include joint research publications on the methodological challenges of peer review, NLP and ML for intelligent support of peer-reviewing and actionable proposals, informed by the experience of participants as researchers as well as in various roles including chairs, editors, conference organizers, reviewers, and authors.
- Osama Mohammed Afzal (MBZUAI - Abu Dhabi, AE)
- Koen Dercksen (Radboud University Nijmegen, NL)
- Nils Dycke (TU Darmstadt, DE)
- Alexander Goldberg (Carnegie Mellon University - Pittsburgh, US)
- Iryna Gurevych (TU Darmstadt, DE) [dblp]
- Jason Hartline (Northwestern University - Evanston, US) [dblp]
- Tom Hope (The Hebrew University of Jerusalem, IL)
- Dirk Hovy (Bocconi University - Milan, IT) [dblp]
- Eddie Kohler (Harvard University - Allston, US)
- Jonathan Kummerfeld (The University of Sydney, AU)
- Ilia Kuznetsov (TU Darmstadt, DE) [dblp]
- Anne Lauscher (Universität Hamburg, DE)
- Kevin Leyton-Brown (University of British Columbia - Vancouver, CA) [dblp]
- Sheng Lu (TU Darmstadt, DE)
- Dorsa Majdi (Sharif University of Technology - Tehran, IR)
- Mausam (Indian Institute of Technology - New Delhi, IN)
- Bahar Mehmani (Elsevier BV - Amsterdam, NL)
- Margot Mieskes (Hochschule Darmstadt, DE) [dblp]
- Aurélie Névéol (CNRS - Orsay, FR)
- Danish Pruthi (Indian Institute of Science - Bangalore, IN)
- Lizhen Qu (Monash University - Clayton, AU)
- Anna Rogers (IT University of Copenhagen, DK)
- Roy Schwartz (The Hebrew University of Jerusalem, IL)
- Nihar Shah (Carnegie Mellon University - Pittsburgh, US)
- Noah A. Smith (University of Washington - Seattle, US)
- Thamar Solorio (MBZUAI - Abu Dhabi, AE)
- Jingyan Wang (Georgia Institute of Technology, US)
- Xiaodan Zhu (Queen's University - Kingston, CA)
Classification
- Artificial Intelligence
- Computation and Language
- Machine Learning
Keywords
- peer review
- diversity
- natural language processing
- incentives