TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 25032

Task and Situation-Aware Evaluation of Speech and Speech Synthesis

( Jan 12 – Jan 15, 2025 )

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/25032

Organizers

Contact

Dagstuhl Seminar Wiki

Shared Documents

Schedule
  • Upload (Use personal credentials as created in DOOR to log in)

Motivation

Call to Action This Dagstuhl Seminar Task and Situation-Aware Evaluation of Speech and Speech Synthesis is an opportunity to help redefine the metrics and methods traditionally used to evaluate speech synthesis and human speech as they are used across disciplines, tasks, and applications. The seminar is designed as a collaborative platform where experts from engineering, the humanities, social sciences, and more come together to challenge the status quo and drive innovation. Through the combination of perspectives and the bridging of gaps between scientific disciplines we hope to uncover and develop evaluation techniques that are not only scientifically rigorous but also contextually relevant to the diverse uses of speech technology today. We seek the collective expertise of participants from diverse areas of science and technology – ranging from phonetics to machine learning, from rhetorical analysis to practical applications in assistive technologies. This interdisciplinary approach is essential for crafting evaluation standards that are as dynamic and nuanced as the technologies and applications they aim to assess. To this end, we encourage you to share, debate, and refine ideas, methods, and tools that can contribute to a transformative discussion on the evaluation of speech synthesis.

Context and Objectives While the de facto standard TTS evaluation metrics, such as the Mean Opinion Score (MOS), have been criticised for decades, they are currently barraged by publications pointing to a variety of flaws. More importantly, the recent disruptive progress in TTS techniques has rendered traditional evaluation targets such as naturalness (a hard-to-define blend of signal quality and human-likeness) and intelligibility all but obsolete. Indeed, one of the (positive) reviewers of this seminar proposal suggested asking the fundamental question Is evaluation for speech synthesis still needed? We believe it is, but we also note that the current methods deliver problematic results, such as ceiling effects and the conclusion that synthetic voices are more humanlike than human voices.

Our concern here is that the conventional methods fall short when it comes to addressing the contextual nuances and the specific application needs of modern synthesized speech. If we change the question Rank the naturalness of this voice to Rank this voice as if it was the voice of a professional human performing the same task , the artificial voice will rank considerably lower for a great many tasks. There is a pressing need for more sophisticated approaches that take contextual and situational framing into account and incorporate the complexity and diversity of current and future speech synthesis applications. This may also involve a more nuanced treatment of the participants in evaluations, as the assumption of normal distributed participant characteristics in all populations is unlikely to hold.

Agenda Overview The seminar progresses through a series of sessions of different nature, all of which are partially prepared in advance by organisers and participants alike. After an initial Existing Evaluation Methods Review , in which specifications, validations, and known issues with existing methods are discussed as a benchmark for new methodologies, we delve into Use Case Workshops focused on identifying and detailing specific use cases for speech synthesis and human speech alike, in order to better understand the varied requirements of different applications. Next, we engage in Hands-On Method Development in group sessions to propose, develop, and refine evaluation methods for selected use cases. These practical sessions transitions from theory to action, allowing participants to experiment with and iteratively improve evaluation approaches. Finally, the results are discussed and collected in Guideline Formulation , where we work towards formulating guidelines for selecting and implementing evaluation methods. The session will focus on creating a decision framework that assists researchers and practitioners in choosing the appropriate evaluation metrics based on specific criteria.

Copyright Jens Edlund, Sébastien Le Maguer, Christina Tånnander, and Petra Wagner

Participants

Classification
  • Artificial Intelligence
  • Computation and Language
  • Human-Computer Interaction

Keywords
  • Evaluation
  • Human-in-the-Loop
  • Speech technology
  • Speech-to-Text Synthesis