Using multimodal speech production data to evaluaterticulatory animation for audiovisual speech synthesis

نویسندگان

Ingmar Steiner

Korin Richmond

Slim Ouni

چکیده

The importance of modeling speech articulation for high-quality audiovisual (AV) speech synthesis is widely acknowledged. Nevertheless, while state-of-the-art, data-driven approaches to facial animation can make use of sophisticated motion capture techniques, the animation of the intraoral articulators (viz. the tongue, jaw, and velum) typically makes use of simple rules or viseme morphing, in stark contrast to the otherwise high quality of facial modeling. Using appropriate speech production data could significantly improve the quality of articulatory animation for AV synthesis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Artimate: an articulatory animation framework for audiovisual speech synthesis

We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a threedimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide...

متن کامل

Studies of audiovisual speech perception using production-based animation

This paper will summarize our work at Queen's University and ATR Laboratories on cross-modal speech perception and production. Our approach has been to study these two sides of speech together and to use the multi-modal speech production data to parameterize and control audiovisual animation systems. Two approaches to production-based facial animation have been pursued — one statistical and the...

متن کامل

Audio-Visual Correlation Modeling for Speaker Identification and Synthesis

This thesis addresses two major problems of multimodal signal processing using audiovisual correlation modeling: speaker recognition and speaker synthesis. We address the first problem, i.e., the audiovisual speaker recognition problem within an open-set identification framework, where audio (speech) and lip texture (intensity) modalities are fused employing a combination of early and late inte...

متن کامل

Evaluation of A Viseme-Driven Talking Head

This paper introduces a three-dimensional virtual head for use in speech tutoring applications. The system achieves audiovisual speech synthesis using viseme-driven animation and a coarticulation model, to automatically generate speech from text. The talking head was evaluated using a modified rhyme test for intelligibility. The audiovisual speech animation was found to give higher intelligibil...

متن کامل

2D Audiovisual Text-to-Speech Synthesis for Human-Machine Interaction in Dutch

Speech has always been the most important means of communication between humans. Therefore, using speech in machine-human communication can help in increasing the naturalness of the communication between a computer system and a user. Systems that can make a machine pronounce any given input text are referred to as text-to-speech systems. To further enhance the communication, a talking head can ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Using multimodal speech production data to evaluaterticulatory animation for audiovisual speech synthesis

نویسندگان

چکیده

منابع مشابه

Artimate: an articulatory animation framework for audiovisual speech synthesis

Studies of audiovisual speech perception using production-based animation

Audio-Visual Correlation Modeling for Speaker Identification and Synthesis

Evaluation of A Viseme-Driven Talking Head

2D Audiovisual Text-to-Speech Synthesis for Human-Machine Interaction in Dutch

عنوان ژورنال:

اشتراک گذاری