Harmonic Structure Transform for Speaker Recognition

نویسندگان

  • Kornel Laskowski
  • Qin Jin
چکیده

We evaluate a new filterbank structure, yielding the harmonic structure cepstral coefficients (HSCCs), on a mismatchedsession closed-set speaker classification task. The novelty of the filterbank lies in its averaging of energy at frequencies related by harmonicity rather than by adjacency. Improvements are presented which achieve a 37%rel reduction in error rate under these conditions. The improved features are combined with a similar Mel-frequency cepstral coefficient (MFCC) system to yield error rate reductions of 32%rel, suggesting that HSCCs offer information which is complimentary to that available to today’s MFCC-based systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

It has long been claimed that spectral envelope features outperform prosodic features on speaker recognition tasks. However, the reasons for such an arrangement are not entirely compelling. In the current work we present some evidence to challenge these claims. We propose that energy found at harmonically related frequencies encodes the acoustic correlates of variables which are typically refer...

متن کامل

Speech pre-processing against intentional imposture in speaker recognition

Recently, some large-scale text dependent speaker verification systems have been tested. They show that less than 1% Equal Error Rate can be obtained on a test set score distribution. So far, the majority of impostor tests are performed using speakers who don’t really try to fool the system. This can be explained by the lack of databases recorded for this purpose, and the difficulty for a norma...

متن کامل

Speaker identification for whispered speech based on frequency warping and score competition

In certain situations, talkers will intentionally use whisper instead of neutral speech for the sake of privacy or confidentiality, which severely degrades the performance of speaker identification systems trained with only neutral speech. There are considerable differences in the spectral structure between whisper and neutral speech due to an absence of voice harmonic excitation. This study in...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Speaker Identification Using Admissible Wavelet Packet Based Decomposition

Mel Frequency Cepstral Coefficient (MFCC) features are widely used as acoustic features for speech recognition as well as speaker recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011