Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age

نویسندگان

Christian A. Müller

Felix Burkhardt

چکیده

The most successful systems in previous comparative studies on speaker age recognition used short-term cepstral features modeled with Gaussian Mixture Models (GMMs) or applied multiple phone recognizers trained with the data of speakers of the respective class. Acoustic analyses, however, indicate that certain features such as pitch extracted from a longer span of speech correlate clearly with the speaker age although the systems based on those features have been inferior to the before mentioned approaches. In this paper, three novel systems combining short-term cepstral features and long-term features for speaker age recognition are compared to each other. A system combining GMMs using frame-based MFCCs and SupportVector-Machines using long-term pitch performs best. The results indicate that the combination of the two feature types is a promising approach, which corresponds to findings in related fields like speaker recognition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data

Short-term cepstral features have long been chosen as standard features for speaker recognition thanks to their relevance and effectiveness. In contrast, discriminative features, calculated by a multi-layer perceptron (MLP) from much longer stretches of time, have been gradually adopted in automatic speech recognition (ASR). It has been shown that augmenting short-term cepstral features with lo...

متن کامل

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that ...

متن کامل

Whether Mfcc or Gfcc Is Better for Recognizing Emotion from Speech? a Study

A major challenge for automatic speech recognition (ASR) relates to significant performance reduction in noisy environments. Recently, the study of the emotional content of speech signals got more importance and hence, many systems have been proposed to identify the emotional content of a spoken utterance. The important aspects of the design of a speech emotion recognition system are pre-proces...

متن کامل

On the Use of Long-Term Average Spectrum in Automatic Speaker Recognition

State-of-the-art automatic speaker recognition systems use mel-frequency cepstral coefficients (MFCC) features to describe the spectral properties of speakers. In forensic phonetics, the long-term average spectrum (LTAS) has been used for the same purpose. LTAS provides an intuitive graphical representation which can be used to visualize and quantify speaker differences. However, few studies ha...

متن کامل

MFCC and Prosodic Feature Extraction Techniques:

In this paper our main aim to provide the difference between cepstral and non-cepstral feature extraction techniques. Here we try to cover-up most of the comparative features of Mel Frequency Cepstral Coefficient and prosodic features. In speaker recognition, there are two type of techniques are available for feature extraction: Short-term features i.e. Mel Frequency Cepstral Coefficient (MFCC)...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age

نویسندگان

چکیده

منابع مشابه

Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

Whether Mfcc or Gfcc Is Better for Recognizing Emotion from Speech? a Study

On the Use of Long-Term Average Spectrum in Automatic Speaker Recognition

MFCC and Prosodic Feature Extraction Techniques:

عنوان ژورنال:

اشتراک گذاری