Topic Areas Robust speech recognition and adaptation Title of paper: A novel use of residual noise model for the Parallel Model Combination
نویسنده
چکیده
The robust speech recognition system available for all environments has become a hotspot in speech research. The environment adaptive methods play an important part in improving the system robustness including PMC. In this paper, PMC is investigated deeply and further developed to achieve the better performance. In general noisy environments, the channel distortion and noise corruption are both present and time-varying, the signal-to-noise ratio (SNR) often be low. PMC can get around additive noise as well as convolutive noise. However, it has some shortcomings: Firstly, even with the precise measured convolutive noise model, it is hard to get high recognition accuracy since the speech recognition is based on short-frame windowing. The effect of convolutive noise on the speech is equivalent to periodic convolution in time domain; however, actual distorted speech results from linear convolution. The combined noisy speech models using PMC cannot model the actual distorted speech precisely. Secondly, the SNR lower, the additive noise model is more dominant in combined noisy speech models. It is impossible to achieve good performance with such additive noise model. On the other hand, it is known that some efficient noise-reduction schemes, such as the spectral subtraction (SS) schemes, CMN, etc can achieve good performance in some extents. In these cases, it is assumed that the residual noises are sufficiently small so that no quantitative modeling for them is needed. However, in our experiments with noise reduction schemes, it is easy to observe the deviations of the distributions of the enhanced speech from those of the corresponding clean speech. This suggests it is useful to model the residual noises. Motivated by these reasons, we propose a new PMC: using the noise-reduced observation data, we can model the residual noise model, combine such new noise model with the clean speech models, we can construct the pseudo-clean speech models that closely match the noise-reduced test data. The new approach can incorporate with noise-reduction schemes innovatively and improve the recognition accuracy in adverse environments. In our experiment, Cambridge's HTK toolkit 3.0 was used as test platform with suitable modification embedding PMC algorithms implement the continuous Mandarin digit recognition. The training data were collected in clean office environment while the test data included the data contaminated by white Gaussian noise at different SNR levels and also the noisy speech collected in real environment. The results of experiment clarify the effectiveness of the proposed approach. The paper is organized …
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملResidual noise compensation for robust speech recognition in nonstationary noise
We present a model-based noise compensation algorithm for robust speech recognition in nonstationary noisy environments. The effect of noise is split into a stationary part, compensated by parallel model combination, and a time varying residual. The evolution of residual noise parameters is represented by a set of state space models. The state space models are updated by Kalman prediction and t...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کامل