Detecting paralinguistic events in audio stream using context in features and probabilistic decisions
نویسندگان
چکیده
Non-verbal communication involves encoding, transmission and decoding of non-lexical cues and is realized using vocal (e.g. prosody) or visual (e.g. gaze, body language) channels during conversation. These cues perform the function of maintaining conversational flow, expressing emotions, and marking personality and interpersonal attitude. In particular, non-verbal cues in speech such as paralanguage and non-verbal vocal events (e.g. laughters, sighs, cries) are used to nuance meaning and convey emotions, mood and attitude. For instance, laughters are associated with affective expressions while fillers (e.g. um, ah, um) are used to hold floor during a conversation. In this paper we present an automatic non-verbal vocal events detection system focusing on the detect of laughter and fillers. We extend our system presented during Interspeech 2013 Social Signals Sub-challenge (that was the winning entry in the challenge) for frame-wise event detection and test several schemes for incorporating local context during detection. Specifically, we incorporate context at two separate levels in our system: (i) the raw frame-wise features and, (ii) the output decisions. Furthermore, our system processes the output probabilities based on a few heuristic rules in order to reduce erroneous frame-based predictions. Our overall system achieves an Area Under the Receiver Operating Characteristics curve of 95.3% for detecting laughters and 90.4% for fillers on the test set drawn from the data specifications of the Interspeech 2013 Social Signals Sub-challenge. We perform further analysis to understand the interrelation between the features and obtained results. Specifically, we conduct a feature sensitivity analysis and correlate it with each feature's stand alone performance. The observations suggest that the trained system is more sensitive to a feature carrying higher discriminability with implications towards a better system design.
منابع مشابه
Detection of children's paralinguistic events in interaction with caregivers
Paralinguistic cues in children’s speech convey the child’s affective state and can serve as important markers for the early detection of autism spectrum disorder (ASD). In this paper, we detect paralinguistic events, such as laughter and fussing/crying, along with toddlers’ speech from the Multi-modal Dyadic Behavior Dataset (MMDB). We use both spectral and prosodic acoustic features selected ...
متن کاملAutomatic detection of laughter
In the context of detecting ‘paralinguistic events’ with the aim to make classification of the speaker’s emotional state possible, a detector was developed for one of the most obvious ‘paralinguistic events’, namely laughter. Gaussian Mixture Models were trained with Perceptual Linear Prediction features, pitch&energy, pitch&voicing and modulation spectrum features to model laughter and speech....
متن کاملParalinguistic Analysis of Children's Speech in Natural Environments
Paralinguistic cues are the non-phonemic aspects of human speech that convey information about the affective state of the speaker. In children’s speech, these events are also important markers for the detection of early developmental disorders. Detecting these events in hours of audio data would be beneficial for clinicians to analyze the social behaviors of children. The chapter focuses on the...
متن کاملA Study of the Relationship between Acoustic Features of “bæle” and the Paralinguistic Information
Language users benefit from special phonetic tools in order to communicate linguistic information as well as different emotional aspects and paralinguistic information through daily conversation. Having functions in conveying semantic information to listeners, prosodic features form the essential part of linguistic behavour, manipulating them potentially can play an important role in transmitt...
متن کاملParalinguistic event detection from speech using probabilistic time-series smoothing and masking
Non-verbal speech cues serve multiple functions in human interaction such as maintaining the conversational flow as well as expressing emotions, personality, and interpersonal attitude. In particular, non-verbal vocalizations such as laughters are associated with affective expressions while vocal fillers are used to hold the floor during a conversation. The Interspeech 2013 Social Signals Sub-C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer speech & language
دوره 36 شماره
صفحات -
تاریخ انتشار 2016