Prediction of formant frequencies from linear combinations of filterbank and cepstral coefficients
نویسنده
چکیده
The topic of this paper is formant frequency prediction using multiple linear regression. This technique provides robust formant frequency estimates but with limited precision. We apply this approach to predict formant frequencies of one male speaker comparing three different spectral representations. Mel scaled filterbanks are shown to perform slightly better than linear prediction based cepstral coefficients. The baseline approach uses one linear prediction model for each formant. This approach is extended by using multiple models for each formant. In this case, each model is trained on data from sub-bands of the formant frequency. This method is shown to be very useful for predicting F2, which has a large frequency span. The rms error of F2 can be reduced by up to 45% using multiple models on an independent test set. Moderate sizes of training data suffice to derive the linear prediction models which implies that predictors can be trained for a new speaker with a small formant labelling effort.
منابع مشابه
Spectral Subband Centroids as Complementary Features for Speaker Authentication
Most conventional features used in speaker authentication are based on estimation of spectral envelopes in one way or another, e.g., Mel-scale Filterbank Cepstrum Coefficients (MFCCs), Linear-scale Filterbank Cepstrum Coefficients (LFCCs) and Relative Spectral Perceptual Linear Prediction (RASTA-PLP). In this study, Spectral Subband Centroids (SSCs) are examined. These features are the centroid...
متن کاملFormants Estimation Techniques for Speech Analysis
Measuring formant frequencies in speech signals is indispensable for the search and technically problematic. Accurate measurement of formant frequencies is important in many studies of speech perception and production. Unfortunately, there is no totally effective method to allow good valuations of these frequencies. This paper presents a comparative study of two techniques of speech parameteriz...
متن کاملPredicting Formant Frequencies from MFCC Vectors
This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCCs and formant frequencies using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method predict...
متن کاملSpectral subband centroid features for speech recognition
Cepstral coefficients derived either through linear prediction (LP) analysis or from filter bank are perhaps the most commonly used features in currently available speech recognition systems. In this paper, we propose spectral subband centroids as new features and use them as supplement to cepstral features for speech recognition. We show that these features have properties similar to formant f...
متن کاملExperimental evaluation of features for robust speaker identification
This paper presents an experimental evaluation of different features and channel compensation techniques for robust speaker identification. The goal is to keep all processing and classification steps constant and to vary only the features and compensations used to allow a controlled comparison. A general, maximum-likelihood classifier based on Gaussian mixture densities is used as the classifie...
متن کامل