Cepstral Compensation Using Statistical Linearization
نویسندگان
چکیده
Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case of linear filtering and additive noise, it is extremely difficult to do this analytically. Previously-attempted analytical solutions for the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech, or they have relied on the availability of large environment-specific adaptation sets. In this paper we present the Vector Polynomial approximationS (VPS) method to compensate for the effects of linear filtering and additive noise on the PDF of clean speech. VPS also estimates the parameters of the environment, namely the noise and the channel, by using statistically linearized approximations of these effects. We evaluate the performance of this method (VPS) using the CMU SPHINX-II system on the alphanumeric CENSUS database corrupted with artificial white Gaussian noise. VPS provides improvements of up to 15 percent in relative recognition accuracy over our previous best algorithm, VTS, while being up to 20 percent more computationally efficient.
منابع مشابه
Computationally Efficient Cepstral Domain Feature Compensation
In this letter, we propose a novel approach to feature compensation performed in the cepstral domain. Processing in the cepstral domain has the advantage that the spectral correlation among different frequencies is taken into consideration. By introducing a linear approximation with diagonal covariance assumption, we modify the conventional log-spectral domain feature compensation technique to ...
متن کاملExperimental evaluation of features for robust speaker identification
This paper presents an experimental evaluation of different features and channel compensation techniques for robust speaker identification. The goal is to keep all processing and classification steps constant and to vary only the features and compensations used to allow a controlled comparison. A general, maximum-likelihood classifier based on Gaussian mixture densities is used as the classifie...
متن کاملSignal Processing for Robust Speech Recognition
This paper describes several new cepstral-based compensation procedures that render the SPHINX-II system more robust with respect to acoustical environment. The first algorithm, phonedependent cepstral compensation, is similar in concept to the previously-described MFCDCN method, except that cepstral compensation vectors are selected according to the current phonetic hypothesis, rather than on ...
متن کاملSpeech feature compensation based on pseudo stereo codebooks for robust speech recognition in additive noise environments
In this paper, we propose several compensation approaches to alleviate the effect of additive noise on speech features for speech recognition. These approaches are simple yet efficient noise reduction techniques that use online constructed pseudo stereo codebooks to evaluate the statistics in both clean and noisy environments. The process yields transforms for noisecorrupted speech features to ...
متن کاملFuzzy Modeling and Synchronization of a New Hyperchaotic Complex System with Uncertainties
In this paper, the synchronization of a new hyperchaotic complex system based on T-S fuzzy model is proposed. First, the considered hyperchaotic system is represented by T-S fuzzy model equivalently. Then, by using the parallel distributed compensation (PDC) method and by applying linear system theory and exact linearization (EL) technique, a fuzzy controller is designed to realize the synchron...
متن کاملCepstral Features and Text-Dependent Speaker Identification – A Comparative Study
In the study, the effectiveness of combinations of cepstral features, channel compensation techniques, and different local distances in the Dynamic Time Warping (DTW) algorithm is experimentally evaluated in the text-dependent speaker identification task. The training and the testing has been done with noisy telephone speech (short phrases in Bulgarian with length of about 2 seconds) selected f...
متن کامل