Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data

نویسندگان

Xu Li

Zhiyong Wu

Helen M. Meng

Jia Jia

Xiaoyan Lou

Lianhong Cai

چکیده

One of the essential problems in synthesizing expressive talking avatar is how to model the interactions between emotional facial expressions and lip movements. Traditional methods either simplify such interactions through separately modeling lip movements and facial expressions, or require substantial high quality emotional audio-visual bimodal training data which are usually difficult to collect. This paper proposes several methods to explore different possibilities in capturing the interactions using a large-scale neutral corpus in addition to a small size emotional corpus with limited amount of data. To incorporate contextual influences, deep bidirectional long short-term memory (DBLSTM) recurrent neural network is adopted as the regression model to predict facial features from acoustic features, emotional states as well as contexts. Experimental results indicate that the method by concatenating neutral facial features with emotional acoustic features as the input of DBLSTM model achieves the best performance in both objective and subjective evaluations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

Facial expression is one of the most expressive ways for human beings to deliver their emotion, intention, and other nonverbal messages in face to face communications. In this chapter, a layered parametric framework is proposed to synthesize the emotional facial expressions for an MPEG4 compliant talking avatar based on the three dimensional PAD model, including pleasure-displeasure, arousal-no...

متن کامل

Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar

Facial expression plays an important role in face to face communication in that it conveys nonverbal information and emotional intent beyond speech. In this paper, an approach for facial expression synthesis with an expressive Chinese talking avatar is proposed, where a layered parametric framework is designed to synthesize intermediate facial expressions using PAD emotional parameters [5], whi...

متن کامل

Data-driven synthesis of expressive visual speech using an MPEG-4 talking head

This paper describes initial experiments with synthesis of visual speech articulation for different emotions, using a newly developed MPEG-4 compatible talking head. The basic problem with combining speech and emotion in a talking head is to handle the interaction between emotional expression and articulation in the orofacial region. Rather than trying to model speech and emotion as two separat...

متن کامل

Data-driven Synthesis of Expr using an MPEG-4 Ta

متن کامل

Evaluation of the Expressivity of a Swedish Talking Head in the Context of Human-machine Interaction

This paper describes a first attempt at synthesis and evaluation of expressive visual articulation using an MPEG-4 based virtual talking head. The synthesis is data-driven, trained on a corpus of emotional speech recorded using optical motion capture. Each emotion is modelled separately using principal component analysis and a parametric coarticulation model. In order to evaluate the expressivi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data

نویسندگان

چکیده

منابع مشابه

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar

Data-driven synthesis of expressive visual speech using an MPEG-4 talking head

Data-driven Synthesis of Expr using an MPEG-4 Ta

Evaluation of the Expressivity of a Swedish Talking Head in the Context of Human-machine Interaction

عنوان ژورنال:

اشتراک گذاری