Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features

نویسندگان

  • Marko Lugger
  • Bin Yang
چکیده

Paralinguistic properties play a more and more decisive role in recent speech processing systems like automatic speech recognition (ASR) or natural text-to-speech systems (TTS). Besides the linguistic information, the so called paralinguistic properties can help solving ambiguities in man-machine-interaction. Nowadays, such man-machine-interfaces can be found for example in call-centers, in driver assistance systems, or at the personal computer at home. There are many different applications for the recognition of paralinguistic properties, e.g. gender, age, voice quality, emotion, or alcohol consumption. Among these properties, the emotional state of a speaker has a superior position because it strongly affects the acoustic signal produced by the speaker in all kind of conversational speech. Emotion recognition has its applications in various fields e.g. in call centers to detect angry customers, in entertainment electronics, in linguistics, and even in politics to analyse speeches of politicians to train the candidates for election campaigns. Various attempts show quite good results in the case of speaker dependent classification (Lugger & Yang, 2006; McGilloway et al., 2000; Nogueiras et al., 2001). But the hardest task and also the most relevant in practice is the speaker independent emotion recognition. Speaker independent means that the speaker of the classified utterances is not included in the training database of the system. He is unknown for the classifier and the deduced learning rules in the training phase. Up to now, a good speaker independent emotion recognition could only be achieved by using very large feature sets in combination with very complex classifiers (Schuller et al., 2006; Lee & Narayanan, 2005). In (Schuller et al., 2006), an average recognition rate of 86.7% was achieved for seven emotions by using 4000 features and support vector machine as classifier. In this work, our goal is to further improve the classification performance of the speaker independent emotion recognition and make it more applicable for real-time systems. Therefore, we use the same German database consisting of 6 basic emotions: sadness, boredom, neutral, anxiety, happiness, and anger. For lack of relevance disgust is ignored. But in contrast to other approaches, we focus on the extraction of less but more adapted features for emotion recognition. Because for real-time systems the feature extraction is the most time consuming part in the whole process chain, we try to reduce the number of features. At the same time we study multi-stage classifiers to optimally adjust the reduced feature number to the different class discriminations during classification. In comparison to support vector machines or neural networks, the Bayesian classifier we use can be O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Multi-modal Music Emotion Classification

The performance of categorical music emotion classification that divides emotion into classes and uses audio features alone for emotion classification has reached a limit due to the presence of a semantic gap between the object feature level and the human cognitive level of emotion perception. Motivated by the fact that lyrics carry rich semantic information of a song, we propose a multi-modal ...

متن کامل

Creaky voice and the classification of affect

Creaky voice, a phonation type involving a low frequency and often highly irregular vocal fold vibration, has the potential both to indicate emotion and to confuse automatic processing algorithms, making it both friend and enemy to emotion recognition systems. This paper explores the effect of creak on affect classification performance using hidden Markov models (HMMs) trained on a variety of v...

متن کامل

کاهش رنگ تصاویر با شبکه‌های عصبی خودسامانده چندمرحله‌ای و ویژگی‌های افزونه

Reducing the number of colors in an image while preserving its quality, is of importance in many applications such as image analysis and compression. It also decreases memory and transmission bandwidth requirements. Moreover, classification of image colors is applicable in image segmentation and object detection and separation, as well as producing pseudo-color images. In this paper, the Kohene...

متن کامل

Identification of Houseplants Using Neuro-vision Based Multi-stage Classification System

In this paper, we present a machine vision system that was developed on the basis of neural networks to identify twelve houseplants. Image processing system was used to extract 41 features of color, texture and shape from the images taken from front and back of the leaves. The features were fed into the neural network system as the recognition criteria and inputs. Multilayer perceptron (MLP) ne...

متن کامل

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012