Acoustic model selection for recognition of regional accented speech
نویسنده
چکیده
Accent is cited as an issue for speech recognition systems [1]. Research has shown that accent mismatch between the training and the test data will result in significant accuracy reduction in Automatic Speech Recognition (ASR) systems. Using HMM based ASR trained on a standard English accent, our study shows that the error rates can be up to seven times higher for accented speech, than for standard English. Hence the development of accent-robust ASR systems is of significant importance. This research investigates different acoustic modelling techniques for compensating for the effects of regional accents on the performance of ASR systems. The study includes conventional Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) and more contemporary Deep Neural Network (DNN)-HMM systems. In both cases we consider both supervised and unsupervised techniques. This work uses the WSJCAM0 corpus as a set of ‘accent neutral’ data and accented data from the Accents of the British Isles (ABI) corpora. Initially, we investigated a model selection approach, based on automatic accent identification (AID). Three AID systems were developed and evaluated in this work, namely i-vector, phonotactic, and ACCDIST-SVM. Each focuses on a different property of speech to achieve AID. We use two-dimensional projections based on Expectation Maximization-Principal Component Analysis (EM-PCA) and Linear Discriminative Analysis (LDA) to visualise the different accent spaces and use these visualisations to analyse the AID and ASR results. In GMM-HMM based ASR systems, we show that using a small amount of data from a test speaker to select an accented acoustic model using AID, results in superior performance compared to that obtained with unsupervised or supervised speaker adaptation. A possible objection to AID-based model selection is that in each accent there exist speakers who have varying degrees of accent, or whose accent exhibits properties of other accents. This motived us to investigated whether using an acoustic model created based on neighbouring speakers in the accent space can result in better performance. In conclusion, the maximum reduction in error rate achieved over all GMM-HMM based adaptation approaches is obtained by using AID to select an accent-specifc model followed by speaker adaptation. It is also shown that the accuracy of an AID system does not have a high impact on the gain obtained by accent
منابع مشابه
Accent detection and speech recognition for Shanghai-accented Mandarin
As speech recognition systems are used in ever more applications, it is crucial for the systems to be able to deal with accented speakers. Various techniques, such as acoustic model adaptation and pronunciation adaptation, have been reported to improve the recognition of non-native or accented speech. In this paper, we propose a new approach that combines accent detection, accent discriminative...
متن کاملUnsupervised model selection for recognition of regional accented speech
This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker’s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the ‘true’ accent ...
متن کاملPartial Change Accent Models Speech Recog
Regional accents in Mandarin speech result mostly from partial phone changes due to the interlanguage system of non-native speakers. We propose partial change accent models based on accent-specific units with acoustic model reconstruction for accented Mandarin speech recognition. We use phonological rules of dialectical pronunciations together with likelihood ratio test to model actual accented...
متن کاملAsymmetric Acoustic Model for Accented Speech Recognition
We propose to improve accented speech recognition performance by using asymmetric acoustic model. Our proposed model is generated based on reliable accent specific units and acoustic model reconstruction. The reliable units are extracted with time alignment recognition to cover accent variations at both acoustic and phonetic levels. The asymmetric acoustic model is obtained through selective de...
متن کاملAcoustic and phonetic confusions in accented speech recognition
Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be ...
متن کامل