Gaussian Process Regression for Continuous Emotion Recognition with Global Temporal Invariance
نویسندگان
چکیده
Continuous emotion recognition (CER) is a task which requires the prediction of time series emotional parameter outputs corresponding to query time series inputs given training data in the form of matched pairs of input and output time series. In order to address this task, it is important to be able to model not only relationships between points in the input and output spaces, but also temporal relationships between points within the output space. Gaussian process regression (GPR) is an inference technique which has desirable properties for CER, including its ability to produce predictive distributions over the outputs rather than only point estimates. However, GPR is generally applied to pointwise prediction or interpolation tasks, rather than to predictions of entire functional outputs. We propose a covariance structure that is able to incorporate both input-output and temporal information to produce predictions that take into account the functional nature of CER data. We demonstrate the application of this method to simulated data, and to the AVEC2016 CER task, showing that GPR with this covariance structure is able to make predictions of emotional arousal from audio with over twice the accuracy of a straightforward pointwise application of GPR in the input feature space, and is furthermore able to produce predictions with accuracy approaching that of a competitive CER system using only very general component covariance models.
منابع مشابه
MediaEval 2015: A Segmentation-based Approach to Continuous Emotion Tracking
In this paper we approach the task of continuous music emotion recognition using unsupervised audio segmentation as a preparatory step. The MediaEval task requires predicting emotion of the song with a high time resolution of 2Hz. Though this resolution is necessary to find exact locations of emotional changes, we believe that those changes occur more sparsely. We suggest that using bigger time...
متن کاملHidden Markov model-based speech emotion recognition
In this contribution we introduce speech emotion recognition by use of continuous hidden Markov models. Two methods are propagated and compared throughout the paper. Within the first method a global statistics framework of an utterance is classified by Gaussian mixture models using derived features of the raw pitch and energy contour of the speech signal. A second method introduces increased te...
متن کاملMusic Emotion Recognition using Gaussian Processes
This paper describes the music emotion recognition system developed at the University of Aizu for the Emotion in Music task of the MediaEval’2013 benchmark evaluation campaign. A set of standard feature types provided by the Marsyas toolkit was used to parametrize each music clip. Arousal and valence are modeled separately using Gaussian Process regression (GPR). We compared performances of the...
متن کاملFacing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks
During the last decade, speech emotion recognition technology has matured well enough to be used in some real-life scenarios. However, these scenarios require an almost silent environment to not compromise the performance of the system. Emotion recognition technology from speech thus needs to evolve and face more challenging conditions, such as environmental additive and convolutional noises, i...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کامل