On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training
نویسندگان
چکیده
In this paper, we explore the use of attention-based models for online speech recognition without the usage of language models or searching. Our model is based on an attention-based neural network which directly emits English/Mandarin characters as outputs. The model jointly learns the pronunciation, acoustic and language model. We evaluate the model for online speech recognition on English and Mandarin. On English, we achieve a 33.0% WER on the WSJ task, or a 5.4% absolute reduction in WER compared to an online CTC based system. We also introduce a new training method and show how we can learn joint Mandarin Character-Pinyin models. Our Mandarin character only model achieves a 72% CER on the GALE Phase 2 evaluation, and with our joint Mandarin Character-Pinyin model, we achieve 59.3% CER or 12.7% absolute improvement over the character only model.
منابع مشابه
Prosodic modeling in large vocabulary Mandarin speech recognition
The issue of incorporating prosodic information into speech recognition processes has emerged in recent years. In this work we present a complete framework for Mandarin speech recognition with prosodic modeling considering two-level hierarchical prosodic information for Mandarin Chinese. We developed a GMM-based, a decision-tree-based, and a hybrid approach. The best improvements in character r...
متن کاملLinear Reranking Model for Chinese Pinyin-to-Character Conversion
Pinyin-to-character conversion is an important task for Chinese natural language processing tasks. Previous work mainly focused on n-gram language models and machine learning approaches, or with additional hand-crafted or automatic rule-based post-processing. There are two problems unable to solve for word n-gram language model: out-of-vocabulary word recognition and long-distance grammatical c...
متن کاملN-Best Re-scoring Approaches for Mandarin Speech Recognition
The predominant language model for speech recognition is n-gram language model, which is locally learned and usually lacks global linguistic information such as long-distance syntactic constraints. We first explore two n-best re-scoring approaches for Mandarin speech recognition to overcome this problem. The first approach is linear re-scoring that can combine several language models from vario...
متن کاملAn Empirical Study of Word Error Minimization Approaches for Mandarin Large Vocabulary Continuous Speech Recognition
This paper presents an empirical study of word error minimization approaches for Mandarin large vocabulary continuous speech recognition (LVCSR). First, the minimum phone error (MPE) criterion, which is one of the most popular discriminative training criteria, is extensively investigated for both acoustic model training and adaptation in a Mandarin LVCSR system. Second, the word error minimizat...
متن کاملThe Effect of Orthography on L2 Perception
Mandarin Chinese has two orthographic systems: Chinese characters and Pinyin. While Pinyin is transparent to Mandarin pronunciation, characters are opaque and seldom relate to sound. This study aims to find out the effect of these two systems on Cantonese listeners who are L2 learners of Mandarin. Native Hong Kong Cantonese speakers participated in word recognition experiments which included a ...
متن کامل