A comparison of fusion techniques in mel-cepstral based speaker identification
نویسندگان
چکیده
Input level fusion and output level fusion methods are compared for fusing Mel-frequency Cepstral Coefficients with their corresponding delta coefficients. A 49 speaker subset of the King database is used under wideband and telephone conditions. The best input level fusion system is more computationally complex than the output level fusion system. Both input and output fusion systems were able to outperform the best purely MFCC based system for wideband data. For King telephone data, only the output level fusion based system was able to outperform the best purely MFCC based system. Further experiments using NIST’96 data under matched and mismatched conditions were also performed. Provided it was well tuned, we found that the output level fused system always outperformed the input level fused system under all experimental conditions.
منابع مشابه
Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification
This paper describes a speaker identification system that uses complementary acoustic features derived from the vocal source excitation and the vocal tract system. Conventional speaker recognition systems typically adopt the cepstral coefficients, e.g., Mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC), as the representative features. The cepstral fea...
متن کاملA Framework for Multilingual Text- Independent speaker identification System
This article evaluates the performance of Extreme Learning Machine (ELM) and Gaussian Mixture Model (GMM) in the context of text independent Multi lingual speaker identification for recorded and synthesized speeches. The type and number of filters in the filter bank, number of samples in each frame of the speech signal and fusion of model scores play a vital role in speaker identification accur...
متن کاملFusion of Cross Stream Information in Speaker Verification
This paper addresses the performance of various statistical data fusion techniques for combining the complementary score information in speaker verification. The complementary verification scores are based on the static and delta cepstral features. Both LPCC (Linear prediction-based cepstral coefficients) and MFCC (mel-frequency cepstral coefficients) are considered in the study. The experiment...
متن کاملEvaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects
In this study, a speaker identification system is considered consisting of a feature extraction stage which utilizes both power normalized cepstral coefficients (PNCCs) and Mel frequency cepstral coefficients (MFCC). Normalization is applied by employing cepstral mean and variance normalization (CMVN) and feature warping (FW), together with acoustic modeling using a Gaussian mixture model-unive...
متن کاملImproving Speaker Identification Performance by Combining Vocal Tract Features
This paper proposes fusion and addition techniques of vocal tract features such as Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Mel Frequency Cepstral Coefficients (DMFCC) in speaker identification. Feature extraction plays an important role as a front end processing block in Speaker Identification (SI) process. Mel frequency features are used to extract the spectral characteristics o...
متن کامل