Transformation Sharing Strategies for MLLR Speaker Adaptation
نویسندگان
چکیده
Transformation Sharing Strategies for MLLR Speaker Adaptation Arindam Mandal Chair of the Supervisory Committee: Professor Mari Ostendorf Electrical Engineering Maximum Likelihood Linear Regression (MLLR) estimates linear transformations of automatic speech recognition (ASR) parameters and has achieved significant performance improvements in speaker-independent ASR systems by adapting to target speakers. Evidence is presented in this dissertation that the performance improvements are not consistent across target speakers, and 15% show degradation in performance levels, i.e. increase in word error rates (WER). Robustness of MLLR adaptation is an important problem and solutions to it are crucial for ASR systems that must adapt to a wide-range of speakers. This dissertation presents new research directions that address this problem, exploring two aspects of MLLR transformation sharing using a regression class tree (RCT): the design of RCTs and the online complexity control of adaptation. The standard approach for MLLR transformation sharing uses a single speaker-independent RCT. A new approach is proposed that uses multiple RCTs, each trained using speakercluster-specific data and represents types of speaker variability, determined by an algorithm that partitions a large corpus of speakers in the eigenspace of their MLLR transformations. ASR experiments show that choosing the appropriate RCT for target speakers leads to significant reduction in WER. For unsupervised adaptation, an algorithm is proposed that linearly combines MLLR transformations from cluster-specific RCTs using weights estimated by maximizing the likelihood of adaptation data and achieves small improvements in WER for several tasks in English and Mandarin. More significantly, distributional analysis shows that it reduces the number of speakers with performance loss from adaptation across ranges of adaptation data and WER. The standard approach for complexity control in MLLR uses only the amount of adaptation data from a target speaker. Evidence is presented that this does not produce the optimal number of regression classes and significant improvements in WER are achieved using the oracle number of regression classes. A new solution for complexity control is proposed that predicts the number of regression classes in an RCT using speaker-level features with standard statistical classifiers and achieves moderate improvements in WER. Next, a more flexible approach is proposed that performs node-level pruning in an RCT, using node-level features and produces improved robustness of MLLR adaptation.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملRegularized-MLLR speaker adaptation for computer-assisted language learning system
In this paper, we propose a novel speaker adaptation technique, regularized-MLLR, for Computer Assisted Language Learning (CALL) systems. This method uses a linear combination of a group of teachers’ transformation matrices to represent each target learner’s transformation matrix, thus avoids the over-adaptation problem that erroneous pronunciations come to be judged as good pronunciations afte...
متن کاملAcoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve th...
متن کاملMDL-Based Cluster Number Decision Methods for Speaker Clustering and MLLR Adaptation
Speaker clustering is one of the major methods for speaker adaptation. MLLR (Maximum Likelihood Linear Regression) adaptation using transformation matrices corresponding to phone classes/clusters is another useful method especially when the length of utterances for adaptation is limited. In these methods, how to decide the most appropriate number of clusters is an important research issue. This...
متن کامل