Unsupervised Pronunciation Adaptation for Off-line Transcription of Japanese Lecture Speeches
نویسندگان
چکیده
Observing that most variations in pronunciation are strongly speaker and speaking style dependent, and that the introduction of pronunciation variants in a speaker-independent recognition system is of limited success, we refrain from applying multiple pronunciation variants in the speakerindependent case and instead introduce pronunciation variants without supervision when specializing the recognizer for a specific speaker. Our approach is to take the decoder’s output after a first recognition pass and to realign it allowing several commonly observed pronunciation variations. In a second decoding pass, the pronunciation variations are integrated into the recognizer, weighted using Maximum Likelihood estimates for the pronunciation variants’ likelihoods on the realigned output of the first pass. We observe a small but significant improvement in recognition accuracy compared to the first pass output and conclude that the method is helpful in adjusting the pronunciation modeling structure according to speaker, speaking style and speaking rate. A better prior choice of possible pronunciation variations involving deeper phonetic knowledge would be beneficial for further improvements. We also show experimentally that the improvement gained through pronunciation adaptation does not overlap much with the improvement gained by unsupervised adaptation of the acoustic models, but rather that the achieved WER reductions are additive.
منابع مشابه
Automatic Speech Transcription and Archiving System using the Corpus of Spontaneous Japanese
The target of automatic speech recognition (ASR) research has been shifted from read speech to spontaneous speech. The technology will realize automatic transcription (and translation) of lectures and meetings. In Japan, ”Spontaneous Speech” project has been conducted in last five years, and we set up the huge ”Corpus of Spontaneous Japanese (CSJ)”, which consists of over 2000 speeches (500 hou...
متن کاملEfficient Access to Lecture Audio Archives through Spoken Language Processing
The paper firstly addresses the current state of speech recognition using the “Corpus of Spontaneous Japanese (CSJ)”. It is shown that the large-scale corpus had strong impact in training acoustic and language models considering morphological and pronunciation variations which are characteristic to spontaneous Japanese. Unsupervised adaptation of these models and the speaking rate is also effec...
متن کاملUnsupervised Language Model Adaptation for Lecture Speech Recognition
This paper addresses speaker adaptation of language model in large vocabulary spontaneous speech recognition. In spontaneous speech, the expression and pronunciation of words vary a lot depending on the speaker and topic. Therefore, we present unsupervised methods of language model adaptation to a specific speaker by (1) making direct use of the initial recognition result for generating an enha...
متن کاملUnsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode
In this work the theoretical concepts of unsupervised acoustic model training and the application and evaluation of unsupervised training schemes are described. Experiments aiming at speaker adaptation via unsupervised training are conducted on the KIT lecture translator system. Evaluation takes place with respect to training e ciency and overall system performance in dependency of the availabl...
متن کاملUnsupervised adaptation of statistical language models for speech recognition
It has been demonstrated repeatedly that the acoustic models of a speaker-independent speech recognition system can benefit substantially from the application of unsupervised adaptation methods as a means of speaker enrollment. Unsupervised adaptation has however not yet been applied to the statistical language model component of the recognition system. We investigate two techniques with which ...
متن کامل