Dagstuhl Seminar 25441
Competitions and Empirical Evaluations in Automated Reasoning
( Oct 26 – Oct 31, 2025 )
Permalink
Organizers
- Johannes Klaus Fichte (Linköping University, SE)
- Matti Järvisalo (University of Helsinki, FI)
- Aina Niemetz (Stanford University, US)
- Guido Tack (Monash University - Clayton, AU)
Contact
- Michael Gerke (for scientific matters)
- Simone Schilke (for administrative matters)
Automated Reasoning (AR) emerged as a field in the late 1960s. Its sub-areas cover different aspects of deductive reasoning as practiced in mathematics and formal logic, such as automated theorem proving (ATP), propositional satisfiability (SAT), and various generalizations. Practical and theoretical research enabled ground-breaking success for vast applications of formal methods.
At the core of this success are incredibly sophisticated and complex pieces of software (so-called solvers), which tackle specific problems in sub-areas of AR. Recurring (typically annual) solver competitions and large events such as the FLoC Olympic Games, play a significant role in the practical success of AR. Solver competitions enable AR communities to showcase the current state-of-the-art, document practical advancements, identify challenges from research and industry, and push solver developers to not only aim for reliable and robust tools for a wide range of applications, but to improve their tools beyond their current limitations.
Perhaps even more importantly, competitions are often considered the gold standard when it comes to empirical evaluations and have significant impact on the research community and their empirical methods. Organizing a competition that yields meaningful results both for developers and users of the tools is challenging and a tremendous amount of work. And, since competitions serve as a driving force for new advancements in the area, certain decisions related to competition organization have a significant impact as they often establish standards for formats and experimental evaluations. Thus, competition organizers, competition participants, authors, and reviewers all face similar issues, including how to:
- evaluate the scientific and theoretical value of advancements through competitions and assess academic and practical implications from decisions made in competitions;
- collect, archive, compile, select, and distribute benchmark instances and solvers;
- construct meaningful evaluation measures and metrics (beyond pure runtime);
- ensure repeatability and easy replicability of results;
- design and architect experimental evaluations with a long-term perspective, as stable and easy-to-establish setups that use computational infrastructure efficiently;
- build a community and encourage effective communication with participants, among participants and with other researchers.
Surprisingly, we see few interactions between competitions in AR communities as well as between researchers working on empirical evaluations. Interaction happens primarily during the (short) preparation phase prior to competitions and, briefly, at conferences. Critical reflections and more in-depth discussions of long-term implications between all stakeholders in the area typically fall short.
The Dagstuhl Seminar will center around these issues, discuss questions and solutions with the aim to build a community of practice of competition organization and empirical evaluation in AR.
Classification
- Artificial Intelligence
- Distributed / Parallel / and Cluster Computing
- Logic in Computer Science
Keywords
- Automated Reasoning
- Constraint Solving
- Competitions
- Empirical Evaluation
- Design of Empirical Experiments