Dagstuhl Seminar 25032: Task and Situation-Aware Evaluation of Speech and Speech Synthesis

Dagstuhl Seminar 25032

Task and Situation-Aware Evaluation of Speech and Speech Synthesis

( Jan 12 – Jan 15, 2025 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/25032

Organizers

Jens Edlund (KTH Royal Institute of Technology - Stockholm, SE)
Sébastien Le Maguer (University of Helsinki, FI)
Christina Tånnander (Swedish Agency for Accessible Media - Malmö, SE)
Petra Wagner (Universität Bielefeld, DE)

Contact

Andreas Dolzmann (for scientific matters)
Simone Schilke (for administrative matters)

Shared Documents

Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)

Motivation

Show Motivation

Call to Action This Dagstuhl Seminar Task and Situation-Aware Evaluation of Speech and Speech Synthesis is an opportunity to help redefine the metrics and methods traditionally used to evaluate speech synthesis and human speech as they are used across disciplines, tasks, and applications. The seminar is designed as a collaborative platform where experts from engineering, the humanities, social sciences, and more come together to challenge the status quo and drive innovation. Through the combination of perspectives and the bridging of gaps between scientific disciplines we hope to uncover and develop evaluation techniques that are not only scientifically rigorous but also contextually relevant to the diverse uses of speech technology today. We seek the collective expertise of participants from diverse areas of science and technology – ranging from phonetics to machine learning, from rhetorical analysis to practical applications in assistive technologies. This interdisciplinary approach is essential for crafting evaluation standards that are as dynamic and nuanced as the technologies and applications they aim to assess. To this end, we encourage you to share, debate, and refine ideas, methods, and tools that can contribute to a transformative discussion on the evaluation of speech synthesis.

Context and Objectives While the de facto standard TTS evaluation metrics, such as the Mean Opinion Score (MOS), have been criticised for decades, they are currently barraged by publications pointing to a variety of flaws. More importantly, the recent disruptive progress in TTS techniques has rendered traditional evaluation targets such as naturalness (a hard-to-define blend of signal quality and human-likeness) and intelligibility all but obsolete. Indeed, one of the (positive) reviewers of this seminar proposal suggested asking the fundamental question Is evaluation for speech synthesis still needed? We believe it is, but we also note that the current methods deliver problematic results, such as ceiling effects and the conclusion that synthetic voices are more humanlike than human voices.

Our concern here is that the conventional methods fall short when it comes to addressing the contextual nuances and the specific application needs of modern synthesized speech. If we change the question Rank the naturalness of this voice to Rank this voice as if it was the voice of a professional human performing the same task , the artificial voice will rank considerably lower for a great many tasks. There is a pressing need for more sophisticated approaches that take contextual and situational framing into account and incorporate the complexity and diversity of current and future speech synthesis applications. This may also involve a more nuanced treatment of the participants in evaluations, as the assumption of normal distributed participant characteristics in all populations is unlikely to hold.

Agenda Overview The seminar progresses through a series of sessions of different nature, all of which are partially prepared in advance by organisers and participants alike. After an initial Existing Evaluation Methods Review , in which specifications, validations, and known issues with existing methods are discussed as a benchmark for new methodologies, we delve into Use Case Workshops focused on identifying and detailing specific use cases for speech synthesis and human speech alike, in order to better understand the varied requirements of different applications. Next, we engage in Hands-On Method Development in group sessions to propose, develop, and refine evaluation methods for selected use cases. These practical sessions transitions from theory to action, allowing participants to experiment with and iteratively improve evaluation approaches. Finally, the results are discussed and collected in Guideline Formulation , where we work towards formulating guidelines for selecting and implementing evaluation methods. The session will focus on creating a decision framework that assists researchers and practitioners in choosing the appropriate evaluation metrics based on specific criteria.

Creative Commons BY 4.0

Jens Edlund, Sébastien Le Maguer, Christina Tånnander, and Petra Wagner

Participants

Show Participants

Please log in to DOOR to see more details.

Elisabeth André (Universität Augsburg, DE) [dblp]
Gérard Bailly (University Grenoble Alpes, FR)
Erica Cooper (NICT - Kyoto, JP) [dblp]
Benjamin Cowan (University College - Dublin, IE) [dblp]
Jens Edlund (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
Naomi Harte (Trinity College Dublin, IE) [dblp]
Simon King (University of Edinburgh, GB) [dblp]
Esther Klabbers (Beaverton, US) [dblp]
Sébastien Le Maguer (University of Helsinki, FI) [dblp]
Zofia Malisz (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
Bernd Möbius (Universität des Saarlandes, DE) [dblp]
Sebastian Möller (TU Berlin, DE & DFKI Berlin, DE) [dblp]
Roger K. Moore (University of Sheffield, GB) [dblp]
Ayushi Pandey (Trinity College Dublin, IE) [dblp]
Olivier Perrotin (University Grenoble Alpes, FR) [dblp]
Fritz Michael Seebauer (Universität Bielefeld, DE) [dblp]
Sofia Strömbergsson (Karolinska Institute - Stockholm, SE) [dblp]
Christina Tånnander (Swedish Agency for Accessible Media - Malmö, SE) [dblp]
David R. Traum (USC - Playa Vista, US) [dblp]
Petra Wagner (Universität Bielefeld, DE) [dblp]
Junichi Yamagishi (National Institute of Informatics - Tokyo, JP) [dblp]
Yusuke Yasuda (Nagoya University, JP)

Classification

Artificial Intelligence
Computation and Language
Human-Computer Interaction

Keywords

Evaluation
Human-in-the-Loop
Speech technology
Speech-to-Text Synthesis

Seminar 25032

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 25032

Task and Situation-Aware Evaluation of Speech and Speech Synthesis

( Jan 12 – Jan 15, 2025 )

Permalink

Organizers

Contact

Shared Documents

Publications

Impacts

Schedule

Motivation

Participants

Classification

Keywords