Dagstuhl Seminar 24311
Resource-Efficient Machine Learning
( Jul 28 – Aug 02, 2024 )
Permalink
Organizers
- Oana Balmau (McGill University - Montréal, CA)
- Matthias Böhm (TU Berlin, DE)
- Ana Klimovic (ETH Zürich, CH)
- Peter R. Pietzuch (Imperial College London, GB)
- Pinar Tözün (IT University of Copenhagen, DK)
Contact
- Andreas Dolzmann (for scientific matters)
- Christina Schwarz (for administrative matters)
Shared Documents
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Schedule
While the capabilities of machine learning models have become more and more impressive in the last decade, one cannot overlook the computational footprint of their end-to-end lifecycle. According to the Stanford AI Index Report [1], the computational complexity of the state-of-the-art language models has increased 7 orders of magnitude since 2017. In turn, this increases the estimated costs to train these models in the cloud by 5 orders of magnitude, and the carbon footprint of training such models are equivalent to 10s of human years. Furthermore, the cost of training is only a fraction of the whole costs. After training, then comes the cost of continuously deploying these models, which depend on the way these models are used for inference and the frequency of retraining to update the models.
The participants of the Dagstuhl Seminar on Resource-Efficient Machine Learning (ML) targeted the computational efficiency challenges for machine learning, especially deep learning, from different angles and by focusing on the different stages. On the first day of the seminar, we split into four groups, each with a specific focus. The groups identified the research questions they want to focus on, delved deeper into the existing work, and identified future steps to continue collaborations beyond the seminar.
The first group, Resource-Efficient Data Selection (Section 3 of the full report), targets the efficiency of data selection methods for training deep learning models. Data selection is a preliminary step before any model training, but specifically for fine-tuning tasks, where a pre-trained model must be specialized for a specific task. The effectiveness of a data selection method is typically evaluated by the accuracy it achieves for the given task. The assumption is, as a side effect, if one can achieve a certain accuracy while using less data, this would improve the efficiency of training. This group questions this assumption and asks the following research question: what are the trade-offs between the computational complexity of a data selection method, its effectiveness in terms of model accuracy, and the end-to-end training efficiency?
The second group, The Future of Portable, Extensible, and Composable Machine Learning Systems (Section 4 of the full report), aims at making the emerging ML systems support a larger diversity of applications more efficiently. At the core of this support lies a departure from the dominant reliance on dense tensor computations. Targeting a larger diversity in applications also requires looking at a larger variety of hardware devices, beyond large accelerators that are highly optimized for dense matrix computations. Targeting such diversity requires co-design and finding the right abstractions across software, compilers, and hardware. The group’s vision results in several research challenges with the following overarching research question: how to design holistic and composable software and hardware frameworks for ML?
The third group, Hardware-Software Co-Design for Machine Learning (Section 5 of the full report), target similar research challenges to the second group, but with a deeper focus on hardware diversity. The group identifies that the conventional way of optimizing machine learning tasks for a certain hardware device is through tight coupling between high-level ML engineering and low-level performance optimizations. Certain high-level optimizations, with a specific hardware in mind, in turn, hinders portability to different hardware devices, resulting in sub-optimal efficiency and missed opportunities for functionality. Therefore, the key research question here is how does one create a hardware stack for ML that enables better portability across different devices?
The fourth group, Workload-Aware Machine Learning Serving (Section 6 of the full report), focus on ML serving. Serving ML models, especially large language models (LLMs), at scale is highly costly and requires substantial hardware resources. To achieve more resource- and performance-efficient model ways of serving models, one needs to adaptively determine which specialized model to serve or cache, or how to optimize a model. This adaptivity is highly dependent on the workload needs that may be dynamic. This group, therefore, aims to answer the following questions: (1) what is the behavior and needs of the real-world serving workloads and (2) how does one build a framework that enables adaptive model serving based on dynamic user and workload needs?
References
- The AI Index Report. The AI Index Report - Measuring trends in AI. https://aiindex. stanford.edu/report/, 2024.

The advances in Machine Learning (ML) are mainly thanks to the exponential evolution of hardware, the availability of the large datasets, and the emergence of machine learning frameworks, which hide the complexities of the underlying hardware, boosting the productivity of data scientists. On the other hand, the computational need of the powerful ML models has increased five orders of magnitude in the past decade. This makes the current rate of increase in model parameters, datasets, and compute budget unsustainable. To achieve more sustainable progress in ML in the future, it is essential to invest in more resource-/energy-/cost-efficient solutions. In this Dagstuhl Seminar, we will explore how to improve ML resource efficiency through software/hardware co-design. We plan to take a holistic view of the ML landscape, which includes data preparation and loading, continual retraining of models in dynamic data environments, compiling ML on specialized hardware accelerators, and serving models for real-time applications with low-latency requirements and constrained resource environments.
This seminar aims at reasoning critically about how we build software and hardware for end-to-end machine learning. We hope that the discussions will lead to increased awareness for understanding the utilization of modern hardware and kickstart future developments to minimize hardware underutilization. We thus would like to bring together academics and industry across fields of data management, machine learning, systems, and computer architecture covering expertise of algorithmic optimizations in machine learning, job scheduling and resource management in distributed computing, parallel computing, and data management and processing. The outcome of the discussions in the seminar will therefore also positively impact the research groups and companies that rely on machine learning.
We have identified the following topics to be discussed during the seminar:
- Characterization and benchmarking of modern ML techniques
- Efficient scheduling of ML tasks
- Measuring sustainability
- Hardware-software co-design for ML
- Data pre-processing and loading

Please log in to DOOR to see more details.
- Matthias Böhm (TU Berlin, DE) [dblp]
- Maximilian Böther (ETH Zürich, CH) [dblp]
- Marco Canini (KAUST - Thuwal, SA) [dblp]
- Jerónimo Castrillón-Mazo (TU Dresden, DE) [dblp]
- Patrick Damme (TU Berlin, DE) [dblp]
- Khuzaima Daudjee (University of Waterloo, CA) [dblp]
- Pamela Delgado (HES-SO Valais-Wallis - Siders, CH) [dblp]
- Thaleia Dimitra Doudali (IMDEA Software Institute - Madrid, ES) [dblp]
- Jens Hagemeyer (Universität Bielefeld, DE) [dblp]
- Steven Hand (Google - Mountain View, US) [dblp]
- Dagmar Kainmüller (Max-Delbrück-Centrum - Berlin, DE) [dblp]
- Fredrik Kjolstad (Stanford University, US) [dblp]
- Ana Klimovic (ETH Zürich, CH) [dblp]
- James Kwok (HKUST - Hong Kong, HK) [dblp]
- Manisha Luthra Agnihotri (TU Darmstadt, DE & DFKI - Darmstadt, DE) [dblp]
- Peter R. Pietzuch (Imperial College London, GB) [dblp]
- Tilmann Rabl (Hasso-Plattner-Institut, Universität Potsdam, DE) [dblp]
- Theo Rekatsinas (Apple - Zürich, CH) [dblp]
- Ties Robroek (IT University of Copenhagen, DK) [dblp]
- Stefanie Scherzinger (Universität Passau, DE) [dblp]
- Tom St. John (Meta - Menlo Park, US) [dblp]
- Foteini Strati (ETH Zürich, CH) [dblp]
- Shinya Takamaeda-Yamazaki (The University of Tokyo, JP) [dblp]
- Pinar Tözün (IT University of Copenhagen, DK) [dblp]
- Lluís Vilanova (Imperial College London, GB) [dblp]
- Eiko Yoneki (University of Cambridge, GB) [dblp]
- Cliff Young (Google - Mountain View, US) [dblp]
- Ehsan Yousefzadeh-Asl-Miandoab (IT University of Copenhagen, DK) [dblp]
Classification
- Hardware Architecture
- Machine Learning
- Operating Systems
Keywords
- resource-efficient systems
- systems for machine learning