Using voice-quality measurements with prosodic and spectral features for speaker diarization

نویسندگان

  • Abraham Woubie
  • Jordi Luque
  • Javier Hernando
چکیده

Jitter and shimmer voice-quality measurements have been successfully used to detect voice pathologies and classify different speaking styles. In this paper, we investigate the usefulness of jitter and shimmer voice measurements in the framework of the speaker diarization task. The combination of jitter and shimmer voice-quality features with the long-term prosodic and shortterm spectral features is explored in a subset of the Augmented Multi-party Interaction (AMI) corpus, a multi-party and spontaneous speech set of recordings. The best results have been obtained by fusing the voice-quality features with the prosodic ones at the feature level, and then fusing them with the spectral features at the score level. Experimental results show more than 20% relative DER improvement compared to the spectral baseline system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Short- and Long-Term Speech Features for Hybrid HMM-i-Vector based Speaker Diarization System

i-vectors have been successfully applied over the last years in speaker recognition tasks. This work aims at assessing the suitability of i-vector modeling within the frame of speaker diarization task. In such context, a weighted cosine-distance between two different sets of i-vectors is proposed for speaker clustering. Speech clusters generated by Viterbi segmentation are first modeled by two ...

متن کامل

The Detection of Overlapping Speech with Prosodic Features for Speaker Diarization

Overlapping speech is responsible for a certain amount of errors produced by standard speaker diarization systems in meeting environment. We are investigating a set of prosody-based long-term features as a potential complement to our overlap detection system relying on short-term spectral parameters. The most relevant features are selected in a two-step process. They are firstly evaluated and s...

متن کامل

Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features

i-vector modeling techniques have been successfully used for speaker clustering task recently. In this work, we propose the extraction of i-vectors from shortand long-term speech features, and the fusion of their PLDA scores within the frame of speaker diarization. Two sets of i-vectors are first extracted from short-term spectral and long-term voice-quality, prosodic and glottal to noise excit...

متن کامل

Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric

In spectral conversion of statistical voice conversion (VC), distance-based measures between the converted and target spectral parameters are often used as evaluation or training criteria. However, even if the same speaker utters the same sentence, the spectral parameters vary utterance by utterance, and thus, spectral distance between utterances still exists. Moreover, the original prosodic fe...

متن کامل

Speeding Up Speaker Diarization by Using Prosodic Features

In this article we present a method to speed up agglomerative clustering used in speaker diarization by using long-term prosodic features. A set of these features is used to decide which clusters should be merged. This strategy reduces the number of decisions that have to be performed using the more calculation-intensive method based on the Bayesian Information Criterion (BIC). We show a speedu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015