Audio-Visual Emotion Recognition Using Semi-Coupled HMM and Error-Weighted Classifier Combination
نویسندگان
چکیده
This paper presents an approach to automatic recognition of emotional states from audio-visual bimodal signals using semi-coupled hidden Markov model and error weighted classifier combination for Human-Computer Interaction (HCI). The proposed model combines a simplified state-based bimodal alignment strategy and a Bayesian classifier weighting scheme to obtain the optimal solution for audio-visual bimodal fusion. The state-based bimodal alignment strategy is proposed to align the temporal relation of the states between audio and visual streams. The Bayesian classifier weighting scheme is adopted to explore the contributions of different audio-visual feature pairs for emotion recognition. For performance evaluation, audio-visual signals with four emotional states (happy, neutral, angry and sad) were collected. Each of the invited four subjects was asked to utter 10 sentences to generate emotional speech and facial expression for each emotion. Experimental results show the efficiency and effectiveness of the proposed method.
منابع مشابه
Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy
In this paper an in depth analysis is undertaken into effective strategies for integrating the audio-visual modalities for the purposes of text-dependent speaker recognition. Our work is based around the well known hidden Markov model (HMM) classifier framework for modelling speech. A framework is proposed to handle the mismatch between train and test observation sets, so as to provide effectiv...
متن کاملAn investigation of HMM classifier combination strategies for improved audio-visual speech recognition
The combining of independent audio and visual HMM classifiers (late integration) has been shown to out perform the combination of audio and visual features in a single HMM classifier (early integration) when either or both modalities are presented with distortion for the task of speech recognition. Theoretical foundations for the optimal combination of these audio and video classifiers are stil...
متن کاملMandarin Audio-visual Speech Recognition with Effects to the Noise and Emotion
This paper presents a Mandarin audio-visual recognition system dealing with noisy and emotional speech signal. In the proposed approach, we extract the visual features of the lips. These features are very important to the recognition system especially in noisy condition or with emotional effects. In this recognition system, we propose to use the weighted-discrete KNN as the classifier and compa...
متن کاملRecognition and Classification of Human Emotion from Audio
In this paper, the audio emotion recognition system is proposed that uses a mixture of rule-based and machine learning techniques to improve the recognition efficacy in the audio paths. The audio path is designed using a combination of input prosodic features (pitch, log-energy, zero crossing rates and Teager energy operator) and spectral features (Mel-scale frequency cepstral coefficients). Me...
متن کاملDynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons
Jointly using audio and video features can increase the robustness of automatic speech recognition systems in noisy environments. A systematic and reliable performance gain, however, is only achieved if the contributions of the audio and video stream to the decoding decision are dynamically optimized, for example via so-called stream weights. In this paper, we address the problem of dynamic str...
متن کامل