Emotion Recognition 97 Development Set Utterances Percentage
نویسندگان
چکیده
Studies of expressive speech have shown that discrete emotions such as anger, fear, joy, and sadness can be accurately communicated, also cross-culturally, and that each emotion is associated with reasonably specific acoustic characteristics [8]. However, most previous research has been conducted on acted emotions. These certainly have something in common with naturally occurring emotions but may also be more intense and prototypical than authentic, everyday expressions [6, 13]. Authentic emotions are, on the other hand, often a combination of different affective states and occur rather infrequently in everyday life. They are, moreover, often restricted to only a few emotions, such as joy or frustration [3, 5, 7]. In the CHIL project, we are interested in acoustic emotion recognition in two of the given scenarios: the Socially Supportive Workspaces and the Connector agent. In the first one, we want to monitor people attending a lecture to give the speaker feedback on the attentive states of the audience; are they positive and laughing, ignorant and bored, or negative and irri-tated? In the Connector scenario, in which somebody tries to reach a person on the phone via the Connector agent, it is of interest to know whether the caller is starting to show some frustration. Since there was no relevant CHIL material recorded and available for our research, we decided to use two other corpora that we considered to be rather close to what we needed. They are recorded in similar circumstances and environments as the CHIL scenarios. For the Socially Supportive Workspaces, we used a corpus of small-group interaction collected at CMU and known as the ISL Meeting Corpus [2]. For the Connector agent, we used a database from the Swedish telephone service company Voice Provider that contains recordings of people interacting with automatic voice response centers [12]. The emotional context of the ISL Corpus is human-human interaction and the emotions conveyed are mainly positive, often associated with laughter. The emotional context of the Voice Provider Corpus is human-machine interaction. The emotions are rather rare and mostly negative due to frustration with the performance of the automatic voice response system. Thus, both databases contain authentic and not acted emotions in settings similar to the CHIL scenarios we want to explore. However, their diverse contexts make the task for the emotion recognizer quite different. In the following, we will first describe
منابع مشابه
Data-driven emotion conversion in spoken English
This paper describes an emotion conversion system that combines independent parameter transformation techniques to endow a neutral utterance with a desired target emotion. A set of prosody conversion methods have been developed which utilise a small amount of expressive training data ( 15 min) and which have been evaluated for three target emotions: anger, surprise and sadness. The system perfo...
متن کاملCombining Ranking and Classification to Improve Emotion Recognition in Spontaneous Speech
We introduce a novel emotion recognition approach which integrates ranking models. The approach is speaker independent, yet it is designed to exploit information from utterances from the same speaker in the test set before making predictions. It achieves much higher precision in identifying emotional utterances than a conventional SVM classifier. Furthermore we test several possibilities for co...
متن کاملClassifying emotions in human-machine spoken dialogs
This paper reports on the comparison between various acoustic feature sets and classification algorithms for classifying spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogs obtained from a commercial application. Emotion recognition is posed as a pattern recognition problem. We used three different techni...
متن کاملEmotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems
For the purpose of determining emotion recognition by acoustic information, we recorded natural dialogs made by two or three players of online games to construct an emotional speech database. Two evaluators categorized recorded utterances in a certain emotion, which were defined with reference to the eight primary emotions of Plutchik’s three-dimensional circumplex model. Furthermore, 14 evalua...
متن کاملAutomatic Emotion Recognition by the Speech Signal
This paper dis cusses approaches to recognize the emotional user state by analyzing spoken utterances on both, the semantic and the signal level. We classify seven emotions: joy, anger, irritation, fear, disgust, sadness and neutral inner state. The introduced methods analyze the wording, the degree of verbosity, the temporal intention rate as well as the history of user utterances. As prosodic...
متن کامل