Publicatie

Automatic Viseme Clustering for Audiovisual Speech Synthesis

Tijdschriftbijdrage - Tijdschriftartikel Conferentiebijdrage

A common approach in visual speech synthesis is the use of visemes as atomic units of speech. In this paper, phoneme-based and viseme-based audiovisual speech synthesis techniques are compared in order to explore the balancing between data availability and an improved audiovisual coherence for synthesis optimization. A technique for automatic viseme clustering is described and it is compared to the standardized viseme set described in MPEG-4. Both objective and subjective testing indicated that a phoneme-based approach leads to better synthesis results. In addition, the test results improve when more different visemes are defined. This raises some questions on the widely applied viseme-based approach. It appears that a many-to-one phoneme-to-viseme mapping is not capable of describing all subtle details of the visual speech information. In addition, with viseme-based synthesis the perceived synthesis quality is affected by the loss of audiovisual coherence in the synthetic speech.

Tijdschrift: Proceedings of Interspeech

ISSN: 1990-9772

Issue: 2011

Pagina's: 2173-2176

Jaar van publicatie:2011

Trefwoorden:audiovisual speech synthesis, visemes, facial animation

Scopus Id: 84865793091
WoS Id: 000316502201035

Publicatie

Automatic Viseme Clustering for Audiovisual Speech Synthesis

Tijdschriftbijdrage - Tijdschriftartikel Conferentiebijdrage

Auteurs/uitgever

Onderzoekseenheden