A vector Taylor series approach for environment-independent speech recognition
نویسندگان
چکیده
In this paper we introduce a new analytical approach to environment compensation for speech recognition. Previous attempts at solving analytically the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech or they have relied on the availability of large environment-specific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneouslyrecorded or “stereo” recordings of clean and degraded speech. In this work we introduce the use of a Vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel. The VTS approach is computationally efficient. It can be applied either to the incoming speech feature vectors, or to the statistics representing these vectors. In the first case the speech is compensated and then recognized; in the second case HMM statistics are modified using the VTS formulation. Both approaches use only the actual speech segment being recognized to compute the parameters required for environmental compensation. We evaluate the performance of two implementations of VTS algorithms using the CMU SPHINX-II system on the 100word alphanumeric CENSUS database and on the 1993 5000word ARPA Wall Street Journal database. Artificial white Gaussian noise is added to both databases. The VTS approaches provide significant improvements in recognition accuracy compared to previous algorithms.
منابع مشابه
HMM adaptation using vector taylor series for noisy speech recognition
In this paper we address the problem of robustness of speech recognition systems in noisy environments. The goal is to estimate the parameters of a HMM that is matched to a noisy environment, given a HMM trained with clean speech and knowledge of the acoustical environment. We propose a method based on truncated vector Taylor series that approximates the performance of a system trained with tha...
متن کاملA new algorithm for robust speech recognition: the delta vector taylor series approach
In this paper we present a new model-based compensation technique called Delta Vector Taylor Series (DVTS). This new technique is an extension and improvement over the Vector Taylor Series (VTS) approach [7] that addresses several of its limitations. In particular, we present a new statistical representation for the distribution of clean speech feature vectors based on a weighted vector codeboo...
متن کاملModel-based approach for robust speech recognition in noisy environments with multiple noise sources
In this paper, we consider the hidden Markov model(HMM) parameter compensation in noisy environments with multiple noise sources based on the vector Taylor series(VTS) approach. General formulations for multiple environmental variables are derived and systematic expectation-maximization(EM) solutions are presented in maximum likelihood(ML) sense. It is assumed that each noise source is independ...
متن کاملTowards non-stationary model-based noise adaptation for large vocabulary speech recognition
Recognition rates of speech recognition systems are known to degrade substantially when there is a mismatch between training and deployment environments. One approach to tackling this problem is to transform the acoustic models based on the channel distortion and noise characteristics of the new environment. Currently, most model adaptation strategies assume that the noise characteristics are s...
متن کاملModeling Overlapping Speech using Vector Taylor Series
Current speaker diarization systems typically fail to successfully assign multiple speakers speaking simultaneously. According to previous studies, overlapping errors account for a large proportion of the total errors in multi-party speech diarization. In this work, we propose a new approach using Vector Taylor Series (VTS) to obtain overlapping speech models assuming individual speaker models ...
متن کامل