Dagstuhl-Seminar 24242
Computational Analysis and Simulation of the Human Voice
( 09. Jun – 14. Jun, 2024 )
Permalink
Organisatoren
- Peter Birkholz (TU Dresden, DE)
- Oriol Guasch Fortuny (Ramon Llul University - Barcelona, ES)
- Nathalie Henrich Bernardoni (University Grenoble Alpes, FR)
- Sten Ternström (KTH Royal Institute of Technology - Stockholm, SE)
Kontakt
- Marsha Kleinbauer (für wissenschaftliche Fragen)
- Susanne Bach-Bernhard (für administrative Fragen)
Gemeinsame Dokumente
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Programm
The human voice is able to produce a very rich set of different sounds, making it the single most important channel for communication human-to-human, and also potentially for human-computer interaction. Spoken communication can be thought of as a stack of layered transport protocols that includes language, speech, voice, and sound. In this Dagstuhl seminar, we will be concerned with the voice and its function as a transducer from neurally encoded speech patterns to sound. This very complex mechanism remains insufficiently explained both in terms of analysing voice sounds, as for example in medical assessment of vocal function, and of simulating them from first principles, as in talking or singing machines. There will be four main themes to the seminar:
Voice Analysis: Measures derived from voice recordings are clinically attractive, being non-invasive and relatively inexpensive. For clinical voice assessment, however, quantitative objective measures of vocal status have been researched for some seven decades, yet perceptual assessment by listening is still the dominating method. Isolating the properties of a voice (the machine) from those of its owner’s speech or singing (the process) is far from trivial. We will explore how computational approaches might facilitate a functional decomposition that can advance beyond conventional cut-off values of metrics and indices.
Voice Visualization: Trained listeners can deduce some of what is going on in the larynx and the vocal tract, but we cannot easily see it or document it. The multidimensionality of the voice poses interesting challenges to the making of effective visualizations. Most current visualizations are textbook transforms of the acoustic signal, but they are not as clinically or pedagogically relevant as they might be. Can functionally or perceptually informed visualizations improve on this situation?
Voice Simulation: balancing low- and high-order models. A “complete” physics-based computational model of the voice organ would have to account for bidirectional energy exchange between fluids and moving structures at high temporal and spatial resolutions, in 3D. Computational brute force is still not an option to represents voice production in all its complexity, and a proper balance between high and low order approaches has to be found. We will discuss strategies for choosing effective partitionings or hybrids of the simulation tasks that could be suitable for specific sub-problems.
Data science and voice research: With today’s machine learning and deep neural network methods, end-to-end systems for both text-to-speech and speech recognition have become remarkably successful, but they remain quite ignorant of the basics of vocal function. Yet machine learning and big data science approaches should be very useful for helping us deal with and account for the variability in voices. Rather than seeking for automated discrimination between normal and pathological voice, clinicians wish for objective assessments of the progress of an intervention, while researchers wish for ways to distil succinct models of voice production from multi-modal big-data observations. We will explore how techniques such as domain-specific feature selection and auto-encoding can make progress toward these goals.
We expect that this seminar will result in (1) leading researchers in the vocological community becoming up-to-date on recent computational advances, (2) seasoned computer scientists and data analysts becoming engaged in voice-related challenges, (3) a critical review of the potentials and limitations of deep learning and computational mechanics techniques, as applied to analysis and simulation of the voice, and (4) a week of creative brainstorming, leading to a roadmap for pursuing outstanding issues in computational voice research.
- Philipp Aichinger (Medizinische Universität Wien, AT) [dblp]
- Marc Arnela (Ramon Llul University, ES) [dblp]
- Lucie Bailly (Université Grenoble Alpes - Saint Martin d'Hères, FR)
- Peter Birkholz (TU Dresden, DE) [dblp]
- Meike Brockmann-Bauser (Universitätsspital Zürich, CH)
- Helena Daffern (University of York, GB) [dblp]
- Michael Döllinger (Universitätsk-Klinikum Erlangen, DE) [dblp]
- Mennatallah El-Assady (ETH Zürich, CH) [dblp]
- Sidney Fels (University of British Columbia - Vancouver, CA) [dblp]
- Mario Fleischer (Charité - Berlin, DE) [dblp]
- Andrés Goméz-Rodellar (NeuSpeLab - Las Rozas de Madrid, ES) [dblp]
- Pedro Gomez-Vilda (NeuSpeLab - Las Rozas de Madrid, ES) [dblp]
- Oriol Guasch Fortuny (Ramon Llul University - Barcelona, ES) [dblp]
- Amelia Gully (University of York, GB) [dblp]
- Nathalie Henrich Bernardoni (University Grenoble Alpes, FR) [dblp]
- Eric Hunter (University of Iowa, US)
- Filipa M.B. Lã (UNED - Madrid, ES) [dblp]
- Yves Laprie (LORIA - Nancy, FR) [dblp]
- Sarah Lehoux (UCLA, US)
- Matthias Miller (ETH Zürich, CH) [dblp]
- Scott Reid Moisik (Nanyang TU - Singapore, SG) [dblp]
- Peter Pabon (Utrecht, NL)
- Jean Schoentgen (Free University of Brussels, BE) [dblp]
- Brad Story (University of Arizona - Tucson, US) [dblp]
- Johan Sundberg (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
- Sten Ternström (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
- Tino Weinkauf (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
- Qian Xue (Rochester Institute of Technology, US)
- Zhaoyan Zhang (UCLA, US) [dblp]
Klassifikation
- Machine Learning
- Sound
Schlagworte
- voice analysis
- voice simulation
- health care
- visualization