Improving Performance of Telephone- Based Mandarin Speech Recognition
نویسندگان
چکیده
Since telephone is the only ubiquitous communications device in current world, it is the largest potential application field for speech techniques. Telephony speech recognition is a core technique for such telephone-based speech applications. It is well known that the bandwidth of telephone line is limited to 300~3400Hz and there are many inherent variations within the telephone network. All these make speech recognition over telephone a more difficult task compared to its desktop pairs. Additionally, due to the freely speaking style required by real applications and the diverse background environment, a perfect laboratory system may become very vulnerable in real world. So the robustness is the life-and-death issue for such commercial systems. In this paper, we will introduce our recent progresses on improving the performance for a Mandarin telephony speech recognition system. Our improvements include a more robust and straightforward feature extraction block for telephony speech and a novel dynamic channel compensation algorithm. And then we will focus our discussion on the strategy of dealing with outof-vocabulary (OOV) utterances. Through all these amendments, the system’s performance obviously improves in real applications.
منابع مشابه
Evaluation of front-end features and noise compensation methods for robust Mandarin speech recognition
This paper describes speaker-independent speech recognition experiments concerning acoustic front-end processing on a telephone database that was recorded in various dialect regions in China. In this paper, three different features based on human voice production, perception and auditory systems have been evaluated for Mandarin speech recognition. Experimental comparisons showed that auditory-f...
متن کاملCodebook Dependent Dynami for Mandarin Speech Recogn
Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the variational mismatch caused by different telephone ...
متن کاملHKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS) from over 2100 Mandarin speakers in mainland China under the DARPA EARS framework. The corpus includes speech data, transcriptions and speaker demographic information. The speech data include 1206 ten-minute natural Mandarin conversations between either stran...
متن کاملImproving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors
It has been well-recognized that the accent has a great impact on the ASR of Chinese Mandarin, therefore, how to improve the performance on the accented speech has become a critical issue in this field. The attribute feature has been proven effective on modelling accented speech, resulting in a significantly improved performance in accent recognition. In this paper, we propose an attribute-base...
متن کاملImproving Language Models for Mandarin Conversational Speech Recognition with Web Data
Lack of data is a problem in training language models for conversational speech recognition, particularly for languages other than English. Experiments in English have successfully used webbased text collection targeted for a conversational style to augment small sets of transcribed speech; here we look at extending these techniques to Mandarin. In addition, we investigate different techniques ...
متن کامل