Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data
نویسندگان
چکیده
One of the essential problems in synthesizing expressive talking avatar is how to model the interactions between emotional facial expressions and lip movements. Traditional methods either simplify such interactions through separately modeling lip movements and facial expressions, or require substantial high quality emotional audio-visual bimodal training data which are usually difficult to collect. This paper proposes several methods to explore different possibilities in capturing the interactions using a large-scale neutral corpus in addition to a small size emotional corpus with limited amount of data. To incorporate contextual influences, deep bidirectional long short-term memory (DBLSTM) recurrent neural network is adopted as the regression model to predict facial features from acoustic features, emotional states as well as contexts. Experimental results indicate that the method by concatenating neutral facial features with emotional acoustic features as the input of DBLSTM model achieves the best performance in both objective and subjective evaluations.
منابع مشابه
Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar
Facial expression is one of the most expressive ways for human beings to deliver their emotion, intention, and other nonverbal messages in face to face communications. In this chapter, a layered parametric framework is proposed to synthesize the emotional facial expressions for an MPEG4 compliant talking avatar based on the three dimensional PAD model, including pleasure-displeasure, arousal-no...
متن کاملFacial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar
Facial expression plays an important role in face to face communication in that it conveys nonverbal information and emotional intent beyond speech. In this paper, an approach for facial expression synthesis with an expressive Chinese talking avatar is proposed, where a layered parametric framework is designed to synthesize intermediate facial expressions using PAD emotional parameters [5], whi...
متن کاملData-driven synthesis of expressive visual speech using an MPEG-4 talking head
This paper describes initial experiments with synthesis of visual speech articulation for different emotions, using a newly developed MPEG-4 compatible talking head. The basic problem with combining speech and emotion in a talking head is to handle the interaction between emotional expression and articulation in the orofacial region. Rather than trying to model speech and emotion as two separat...
متن کاملData-driven Synthesis of Expr using an MPEG-4 Ta
This paper describes initial experiments with synthesis of visual speech articulation for different emotions, using a newly developed MPEG-4 compatible talking head. The basic problem with combining speech and emotion in a talking head is to handle the interaction between emotional expression and articulation in the orofacial region. Rather than trying to model speech and emotion as two separat...
متن کاملEvaluation of the Expressivity of a Swedish Talking Head in the Context of Human-machine Interaction
This paper describes a first attempt at synthesis and evaluation of expressive visual articulation using an MPEG-4 based virtual talking head. The synthesis is data-driven, trained on a corpus of emotional speech recorded using optical motion capture. Each emotion is modelled separately using principal component analysis and a parametric coarticulation model. In order to evaluate the expressivi...
متن کامل