Higher Order Spectral Phase Features for Speaker Identification
نویسندگان
چکیده
This paper investigates the use of higher order spectra (HOS) phase features in the task of speaker identification. Within the speech processing community, short time spectral phase information is widely regarded as unimportant for speaker recognition. Features are generally defined from the magnitude spectrum only. This paper utilises features that contain both magnitude and phase spectral information. These HOS phase features are derived by integrating points along a straight line in bifrequency space. Initial experiments used unconstrained, microphone speech from a 20 male speaker database to construct Gaussian mixture models (GMM) for each speaker. The HOS phase features achieve a correct identification rate of 98.5%, which is similar to the rate achieved by the MFCC feature set (99.4%). Other experiments were conducted on the larger YOHO database of 138 speakers. Average correct identification rates of above 95% were achieved for varying populations sizes up to the full 138 speakers.
منابع مشابه
The effectiveness of higher order spectral phase features in speaker identification
This paper studies the effectiveness of higher order spectra (HOS) phase features in the task of speaker identification. Within the speech processing community, short time spectral phase information is generally regarded as unimportant for speaker recognition. In fact, the most commonly used features for speaker recognition are the Mel frequency cepstral coefficients (MFCC), which are defined f...
متن کاملOn the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification
Conventional Speaker Identification (SI) systems utilise spectral features like Mel-Frequency Cepstral Coefficients (MFCC) or Perceptual Linear Prediction (PLP) as a frontend module. Line Spectral pairs Frequencies (LSF) are popular alternative representation of Linear Prediction Coefficients (LPC). In this paper, an investigation is carried out to extract LSF from perceptually modified speech....
متن کاملFeatures for speaker and language identification
Abstract In this paper we examine several features derived from the speech signal for the purpose of identification of speaker or language from the speech signal. Most of the current systems for speaker and language identification use spectral features from short segments of speech. There are additional features which can be derived from the residual of the speech signal, which correspond to th...
متن کاملImproving Performance of Speaker Identification System Using Complementary Information Fusion
Feature extraction plays an important role as a front-end processing block in speaker identification (SI) process. Most of the SI systems utilize like Mel-Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), Linear Predictive Cepstral Coefficients (LPCC), as a feature for representing speech signal. Their derivations are based on short term processing of speech signal and...
متن کاملFeature Level Compensation for Robust Speaker Identification in Mismatched Conditions
In this paper, robust front end features are proposed for improvement in speaker identification (SI) performance by considering the factors of real world situations, like mismatch between training and testing conditions. The most commonly used MFCC features are very much sensitive to effects such as channel and environment mismatch. Characteristics of speech gets changed with room acoustics, ch...
متن کامل