Harmonic Structure Transform for Speaker Recognition
نویسندگان
چکیده
We evaluate a new filterbank structure, yielding the harmonic structure cepstral coefficients (HSCCs), on a mismatchedsession closed-set speaker classification task. The novelty of the filterbank lies in its averaging of energy at frequencies related by harmonicity rather than by adjacency. Improvements are presented which achieve a 37%rel reduction in error rate under these conditions. The improved features are combined with a similar Mel-frequency cepstral coefficient (MFCC) system to yield error rate reductions of 32%rel, suggesting that HSCCs offer information which is complimentary to that available to today’s MFCC-based systems.
منابع مشابه
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
It has long been claimed that spectral envelope features outperform prosodic features on speaker recognition tasks. However, the reasons for such an arrangement are not entirely compelling. In the current work we present some evidence to challenge these claims. We propose that energy found at harmonically related frequencies encodes the acoustic correlates of variables which are typically refer...
متن کاملSpeech pre-processing against intentional imposture in speaker recognition
Recently, some large-scale text dependent speaker verification systems have been tested. They show that less than 1% Equal Error Rate can be obtained on a test set score distribution. So far, the majority of impostor tests are performed using speakers who don’t really try to fool the system. This can be explained by the lack of databases recorded for this purpose, and the difficulty for a norma...
متن کاملSpeaker identification for whispered speech based on frequency warping and score competition
In certain situations, talkers will intentionally use whisper instead of neutral speech for the sake of privacy or confidentiality, which severely degrades the performance of speaker identification systems trained with only neutral speech. There are considerable differences in the spectral structure between whisper and neutral speech due to an absence of voice harmonic excitation. This study in...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملSpeaker Identification Using Admissible Wavelet Packet Based Decomposition
Mel Frequency Cepstral Coefficient (MFCC) features are widely used as acoustic features for speech recognition as well as speaker recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable f...
متن کامل