Dagstuhl Seminar 20091
SE4ML – Software Engineering for AI-ML-based Systems
( Feb 23 – Feb 28, 2020 )
Permalink
Organizers
- Kristian Kersting (TU Darmstadt, DE)
- Miryung Kim (UCLA, US)
- Guy Van den Broeck (UCLA, US)
- Thomas Zimmermann (Microsoft Corporation - Redmond, US)
Contact
- Michael Gerke (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Schedule
Multiple research disciplines, from cognitive sciences to biology, finance, physics, and the social sciences, as well as many companies, believe that data-driven and intelligent solutions are necessary. Unfortunately, current artificial intelligence (AI) and machine learning (ML) technologies are not sufficiently democratized — building complex AI and ML systems requires deep expertise in computer science and extensive programming skills to work with various machine reasoning and learning techniques at a rather low level of abstraction. It also requires extensive trial and error exploration for model selection, data cleaning, feature selection, and parameter tuning. Moreover, there is a lack of theoretical understanding that could be used to abstract away these subtleties. Conventional programming languages and software engineering paradigms have also not been designed to address challenges faced by AI and ML practitioners.
The goal of this Dagstuhl Seminar is to bring two rather disjoint communities together, software engineering and programming languages (PL/SE) and artificial intelligence and machine learning (AI-ML) to discuss open problems on how to improve the productivity of data scientists, software engineers, and AI-ML practitioners in industry. The issues addressed in the seminar will include the following:
- What challenges do people building AI-ML-based systems face?
- How do we re-think software development tools such as debugging, testing, and verification tools for complex AI-ML-based systems?
- How do we reason about correctness, explainability, repeatability, traceability, and fairness, while building AI-ML pipeline?
- What are innovative paradigms that seamlessly embed, reuse, and chain models, while abstracting away most low-level details?
The topics of the seminar address pressing demands from industry; the research questions are very relevant for practical software systems development that leverages artificial intelligence (AI) and machine learning (ML). In 2016, companies invested $26–39 billion in AI and McKinsey predicts that investments will be growing over the next few years. Any AI- and ML-based systems will need to be built, tested, and maintained, yet there is a lack of established engineering practices in industry for such systems because they are fundamentally different from traditional software systems. Ideas brainstormed in the seminar will contribute to a new suite of ML-relevant software development tools such as debuggers, testers and verification tools that increase developer productivity in building complex AI systems. Furthermore, we will also discuss new innovative AI and ML abstractions that improve programmability in designing intelligent systems.
Any AI- and ML-based systems will need to be built, tested, and maintained, yet there is a lack of established engineering practices in industry for such systems because they are fundamentally different from traditional software systems. Building such systems requires extensive trial and error exploration for model selection, data cleaning, feature selection, and parameter tuning. Moreover, there is a lack of theoretical understanding that could be used to abstract away these subtleties. Conventional programming languages and software engineering paradigms have also not been designed to address challenges faced by AI and ML practitioners. This seminar brainstormed ideas for developing a new suite of ML-relevant software development tools such as debuggers, testers and verification tools that increase developer productivity in building complex AI systems. It also discussed new innovative AI and ML abstractions that improve programmability in designing intelligent systems.
The seminar brought together a diverse set of attendees, primarily coming from two distinct communities: software engineering and programming languages vs. AI and machine learning. Even within each community, we had attendees with various backgrounds and a different emphasis in their research. For example, within software engineering the profile of our attendees ranged from pure programming languages, development methodologies, to automated testing. Within, AI, this seminar brought together people on the side of classical AI, as well as leading experts on applied machine learning, machine learning systems, and many more. We also had several attendees coming from adjacent fields, for example attendees whose concerns are closer to human-computer interaction, as well as representatives from industry. For these reasons, the first two days of the seminar were devoted to bringing all attendees up to speed with the perspective that each other field takes on the problem of developing, maintaining, and testing AI/ML systems.
On the first day of the seminar, Ahmed Hassan and Tim Menzies represented the field of software engineering. Their talks laid the foundation for a lot of subsequent discussion by presenting some key definitions in software engineering for machine learning (SE4ML), identifying areas where there is a synergy between the fields, informing the seminar about their experiences dealing with industry partners, and listing some important open problems. Sameer Singh and Christopher Ré took care of the first day's introduction to machine learning. Christopher Ré described recent efforts in building machine learning systems to help maintain AI/ML systems, specifically for managing training data, and monitoring a deployed system to ensure it keeps performing adequately. Sameer Singh's talk focused on bug finding, and debugging machine learning systems, either by inspecting black-box explanations, generating realistic adversarial examples in natural language processing (NLP), and doing behavioral testing of NLP models to make them more robust.
The second day of the seminar continued to introduce the attendees to some prominent approaches for tackling the SE4ML problem. Elena Glassman presented her work at the intersection of human-computer interaction and software engineering, while Jie Zhang gave an overview of software testing for ML, based on her recent survey of the field. Significant attention during the seminar was spent on the problem of deploying machine learning models in environments that change over time, where the behavior of the AI/ML system diverges from the intended behavior when the model was first developed. For example, such issues were discussed by Barbara Hammer in her talk on machine learning in non-stationary environments. Isabel Valera introduced the seminar to another important consideration when developing AI/ML-based systems: interpretability and algorithmic fairness. Andrea Passerini's talk was aimed at explaining some of the basic principles of machine learning for a non-machine learning audience; for example generalization, regularization, and overfitting, as well as some recent trands in combining learning with symbolic reasoning.
The remainder of the seminar was centered around various breakout sessions and working groups, including sessions on (1) Specifications and Requirements, (2) Debugging and Testing, (3) Model Evolution and Management, and (4) Knowledge Transfer and Education. There were extended discussions on the question "what is a bug?" in an AI/ML setting, what is a taxonomy of such bugs, and can we list real-world examples of such bugs happening in practice. Interleaved with these working groups, there were several demand-driven talks, designed to answer questions that came up during the discussions. For example, Steven Holtzen and Parisa Kordjamshidi introduced the seminar to efforts in the AI community to build higher-level languages for machine learning, in particular probabilistic programming and declaritive learning-based programming. Christian Kästner shared his insights from teaching software engineering for AI/ML-based systems using realistic case studies. Molham Aref gave his unique view on developing such systems from industry, which was a tremendously valuable perspective to include in these discussions.
Overall, this seminar produced numerous new insights into how complex AI-ML systems are designed, debugged, and tested. It was able to build important scientific bridges between otherwise disparate fields, and has spurred collaborations and follow-up work.
- Hadil Abukwaik (ABB - Ladenburg, DE)
- Molham Aref (relationalAI - Berkeley, US) [dblp]
- Earl T. Barr (University College London, GB) [dblp]
- Houssem Ben Braiek (Polytechnique Montréal, CA)
- Pavol Bielik (ETH Zürich, CH) [dblp]
- Carsten Binnig (TU Darmstadt, DE) [dblp]
- Luc De Raedt (KU Leuven, BE) [dblp]
- Rob DeLine (Microsoft Corporation - Redmond, US) [dblp]
- Joachim Giesen (Universität Jena, DE) [dblp]
- Elena Leah Glassman (Harvard University - Cambridge, US) [dblp]
- Nikolas Göbel (RelationalAI - Zürich, CH)
- Jin L.C. Guo (McGill University - Montreal, CA)
- Barbara Hammer (Universität Bielefeld, DE) [dblp]
- Fabrice Harel-Canada (UCLA, US)
- Ahmed E. Hassan (Queen's University - Kingston, CA) [dblp]
- Steven Holtzen (UCLA, US)
- Christian Kästner (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Kristian Kersting (TU Darmstadt, DE) [dblp]
- Miryung Kim (UCLA, US) [dblp]
- Angelika Kimmig (Cardiff University, GB) [dblp]
- Parisa Kordjamshidi (Michigan State University - East Lansing, US)
- Vu Le (Microsoft Corporation - Redmond, US) [dblp]
- Rupak Majumdar (MPI-SWS - Kaiserslautern, DE) [dblp]
- Tim Menzies (North Carolina State University - Raleigh, US) [dblp]
- Andreas Metzger (Universität Duisburg - Essen, DE) [dblp]
- Mira Mezini (TU Darmstadt, DE) [dblp]
- Alejandro Molina (TU Darmstadt, DE)
- Sandeep Neema (DARPA - Arlington, US) [dblp]
- Siegfried Nijssen (UC Louvain, BE) [dblp]
- Andrea Passerini (University of Trento, IT) [dblp]
- Michael Pradel (Universität Stuttgart, DE) [dblp]
- Christopher Ré (Stanford University, US) [dblp]
- Sameer Singh (University of California - Irvine, US) [dblp]
- Daniel Speicher (Universität Bonn, DE)
- Isabel Valera (MPI für Intelligente Systeme - Tübingen, DE) [dblp]
- Guy Van den Broeck (UCLA, US) [dblp]
- Antonio Vergari (UCLA, US)
- Laurie Williams (North Carolina State University - Raleigh, US) [dblp]
- Ce Zhang (ETH Zürich, CH) [dblp]
- Jie Zhang (University College London, GB) [dblp]
- Tianyi Zhang (Harvard University - Cambridge, US) [dblp]
- Xiangyu Zhang (Purdue University - West Lafayette, US) [dblp]
- Thomas Zimmermann (Microsoft Corporation - Redmond, US) [dblp]
Classification
- artificial intelligence / robotics
- programming languages / compiler
- software engineering
Keywords
- Correctness / explainability / traceability / fairness for ML
- Debugging/ testing / verification for ML systems
- Data scientist productivity