Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis

نویسندگان

Gábor Gosztolya

Tamás Grósz

György Szaszák

László Tóth

چکیده

In the Sincerity Sub-Challenge of the Interspeech ComParE 2016 Challenge, the task is to estimate user-annotated sincerity scores for speech samples. We interpret this challenge as a ranklearning regression task, since the evaluation metric (Spearman’s correlation) is calculated from the rank of the instances. As a first approach, Deep Neural Networks are used by introducing a novel error criterion which maximizes the correlation metric directly. We obtained the best performance by combining the proposed error function with the conventional MSE error. This approach yielded results that outperform the baseline on the Challenge test set. Furthermore, we introduce a compact prosodic feature set based on a dynamic representation of F0, energy and sound duration. We extract syllable-based prosodic features which are used as the basis of another machine learning step. We show that a small set of prosodic features is capable of yielding a result very close to the baseline one and that by combining the predictions yielded by DNN and the prosodic feature set, further improvement can be reached, significantly outperforming the baseline SVR on the Challenge test set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients

Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...

متن کامل

Low-rank Representation of Nearest Neighbor Phone Posterior Probabilities to Enhance Dnn Acoustic Modeling

We hypothesize that optimal deep neural networks (DNN) class-conditional posterior probabilities live in a union of lowdimensional subspaces. In real test conditions, DNN posteriors encode uncertainties which can be regarded as a superposition of unstructured sparse noise over the optimal posteriors. We aim to investigate different ways to structure the DNN outputs by exploiting low-rank repres...

متن کامل

The Prosody of Discourse Structure and Content in the Production of Persian EFL Learners

The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...

متن کامل

DNN-SPACE: DNN-HMM-Based Generative Model of Voice F0 Contours for Statistical Phrase/Accent Command Estimation

This paper proposes a method to extract prosodic features from a speech signal by leveraging auxiliary linguistic information. A prosodic feature extractor called the statistical phrase/accent command estimation (SPACE) has recently been proposed. This extractor is based on a statistical model formulated as a stochastic counterpart of the Fujisaki model, a wellfounded mathematical model represe...

متن کامل

The sound of sarcasm

The present study was conducted to identify possible acoustic cues of sarcasm. Native English speakers produced a variety of simple utterances to convey four different attitudes: sarcasm, humour, sincerity, and neutrality. Following validation by a separate naı̈ve group of native English speakers, the recorded speech was subjected to acoustic analyses for the following features: mean fundamental...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis

نویسندگان

چکیده

منابع مشابه

The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients

Low-rank Representation of Nearest Neighbor Phone Posterior Probabilities to Enhance Dnn Acoustic Modeling

The Prosody of Discourse Structure and Content in the Production of Persian EFL Learners

DNN-SPACE: DNN-HMM-Based Generative Model of Voice F0 Contours for Statistical Phrase/Accent Command Estimation

The sound of sarcasm

عنوان ژورنال:

اشتراک گذاری