Towards a true acoustic-visual speech synthesis

نویسندگان

  • Asterios Toutios
  • Utpala Musti
  • Slim Ouni
  • Vincent Colotte
  • Brigitte Wrobel-Dautcourt
  • Marie-Odile Berger
چکیده

This paper presents an initial bimodal acoustic-visual synthesis system able to generate concurrently the speech signal and a 3D animation of the speaker’s face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asynchrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of this approach, since both the synthesized speech signal and the face animation are of good quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visual control of hidden-semi-Markov-model based acoustic speech synthesis

We show how to visually control acoustic speech synthesis by modelling the dependency between visual and acoustic parameters within the Hidden-Semi-Markov-Model (HSMM) based speech synthesis framework. A joint audio-visual model is trained with 3D facial marker trajectories as visual features. Since the dependencies of acoustic features on visual features are only present for certain phones, we...

متن کامل

Acoustic-visual synthesis technique using bimodal unit-selection

This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker’s outer face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. In the visual domain, we mainly focus on the dynamics of the face rather than on rendering. The proposed technique overcom...

متن کامل

Introducing visual target cost within an acoustic-visual unit-selection speech synthesizer

In this paper, we present a method to take into account visual information during the selection process in an acoustic-visual synthesizer. The acoustic-visual speech synthesizer is based on the selection and concatenation of synchronous bimodal diphone units i.e., speech signal and 3D facial movements of the speaker’s face. The visual speech information is acquired using a stereovision techniqu...

متن کامل

Acoustic and Visual Analysis of Expressive Speech: A Case Study of French Acted Speech

Within the framework of developing an expressive audiovisual speech synthesis, an acoustic and visual analysis of expressive acted speech is proposed in this paper. Our purpose is to identify the main characteristics of audiovisual expressions that need to be integrated during synthesis to provide believable emotions to the virtual 3D talking head. We conducted a case study of a semi-profession...

متن کامل

Setup for acoustic-visual speech synthesis by concatenating bimodal units

This paper presents preliminary work on building a system able to synthesize concurrently the speech signal and a 3D animation of the speaker’s face. This is done by concatenating bimodal diphone units, that is, units that comprise both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asynchrony and incoherence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010