Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors
نویسندگان
چکیده
Speaker state recognition is a challenging problem due to speaker and context variability. Intoxication detection is an important area of paralinguistic speech research with potential real-world applications. In this work, we build upon a base set of various static acoustic features by proposing the combination of several different methods for this learning task. The methods include extracting hierarchical acoustic features, performing iterative speaker normalization, and using a set of GMM supervectors. We obtain an optimal unweighted recall for intoxication recognition using score-level fusion of these subsystems. Unweighted average recall performance is 70.54% on the test set, an improvement of 4.64% absolute (7.04% relative) over the baseline model accuracy of 65.9%.
منابع مشابه
Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors
Segmental and suprasegmental speech signal modulations offer information about paralinguistic content such as affect, age and gender, pathology, and speaker state. Speaker state encompasses medium-term, temporary physiological phenomena influenced by internal or external biochemical actions (e.g., sleepiness, alcohol intoxication). Perceptual and computational research indicates that detecting ...
متن کاملSpeaker dependent emotion recognition using prosodic supervectors
This work presents a novel approach for detection of emotions embedded in the speech signal. The proposed approach works at the prosodic level, and models the statistical distribution of the prosodic features with Gaussian Mixture Models (GMM) mean-adapted from a Universal Background Model (UBM). This allows the use of GMM-mean supervectors, which are classified by a Support Vector Machine (SVM...
متن کاملRead and spontaneous speech classification based on variance of GMM supervectors
This paper provides a novel method to classify spoken utterances into reading style or spontaneous style. Read/spontaneous speech classification is important for extracting data to train acoustic models for speech recognition from real data in which read speech and spontaneous speech samples are mixed. We analyzed 23,900 reading and 31,988 spontaneous utterances of 30 speakers and found that va...
متن کاملCombining five acoustic level modeling methods for automatic speaker age and gender recognition
This paper presents a novel automatic speaker age and gender identification approach which combines five different methods at the acoustic level to improve the baseline performance. The five subsystems are (1) Gaussian mixture model (GMM) system based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors, (3) SVM based on GMM maxi...
متن کاملAutomatic speaker age and gender recognition using acoustic and prosodic level information fusion
The paper presents a novel automatic speaker age and gender identification approach which combines seven different methods t both acoustic and prosodic levels to improve the baseline performance. The three baseline subsystems are (1) Gaussian mixture odel (GMM) based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM ean supervectors and (3) SVM...
متن کامل