Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech
نویسندگان
چکیده
Speaker variability is a well-known problem of state-of-theart Automatic Speech Recognition (ASR) systems. In particular, handling children speech is challenging because of substantial differences in pronunciation of the speech units between adult and child speakers. To build accurate ASR systems for all types of speakers Hidden Markov Models with Gaussian Mixture Densities were intensively used in combination with model adaptation techniques. This paper compares different ways to improve the recognition of children speech and describes a novel approach relying on Class-Structured Gaussian Mixture Model (GMM). A common solution for reducing the speaker variability relies on gender and age adaptation. First, it is proposed to replace gender and age by unsupervised clustering. Speaker classes are first used for adaptation of the conventional HMM. Second, speaker classes are used for initializing structured GMM, where the components of Gaussian densities are structured with respect to the speaker classes. In a first approach mixture weights of the structured GMM are set dependent on the speaker class. In a second approach the mixture weights are replaced by explicit dependencies between Gaussian components of mixture densities (as in stranded GMMs, but here the GMMs are class-structured). The different approaches are evaluated and compared on the TIDIGITS task. The best improvement is achieved when structured GMM is combined with feature adaptation.
منابع مشابه
Singing speaker clustering based on subspace learning in the GMM mean supervector space
In this study, we propose algorithms based on subspace learning in the GMM mean supervector space to improve performance of speaker clustering with speech from both reading and singing. As a speaking style, singing introduces changes in the time-frequency structure of a speaker’s voice. The purpose of this study is to introduce advancements for speech systems such as speech indexing and retriev...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملCombination of temporal domain SVD based speech enhancement and GMM based speech estimation for ASR in noise - evaluation on the AURORA2 task -
In this paper, we propose a noise robust speech recognition method by combination of temporal domain singular value decomposition(SVD) based speech enhancement and Gaussian mixture model(GMM) based speech estimation. The bottleneck of GMM based approach is a noise estimation problem. For this noise estimation problem, we incorporated the adaptive noise estimation in GMM based approach. Furtherm...
متن کاملBilateral Weighted Fuzzy C-Means Clustering
Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...
متن کاملSupervised/Unsupervised Voice Activity Detectors for Text- dependent Speaker Recognition on the RSR2015 Corpus
Voice activity detection, i.e., discrimination of the speech/nonspeech segments in a speech signal, is an important enabling technology for a variety of speech-based applications including the speaker recognition. In this work we provide a performance evaluation of the following supervised and unsupervised VAD algorithms in the context of text-dependent speaker recognition on the RSR2015 (Robus...
متن کامل