Lexicon Optimization: Maximizing Lexical Coverage in Speech Recognition through Automated Compounding
نویسنده
چکیده
In this report we show that a lexicon can be designed in such a way that lexical coverage can be maximized by real-time lexicon expansion and a limited word part lexicon for Dutch speech recognition. More specifically, we describe how the lexicon is designed and how the real-time expansion module was built and tested. Tests were performed using a 36.000 entries lexicon. The test results show that out-ofvocabulary rates are rather small, due to automated rule-based compounding of the lexical building blocks. Statistical information was included to improve the accuracy of the rule-based compounding system. This approach proved to be successful.
منابع مشابه
Lexicon optimization for dutch speech recognition in spoken document retrieval
In this paper, ongoing work concerning the language modelling and lexicon optimization of a Dutch speech recognition system for Spoken Document Retrieval is described: the collection and normalization of a training data set and the optimization of our recognition lexicon. Effects on lexical coverage of the amount of training data, of decompounding compound words and of different selection metho...
متن کاملMorphological Decomposition for Asr in German
In this contribution we report on our ongoing work in lexical decomposition for automatic speech recognition (ASR). Lexical decomposition is investigated with a twofold goal: lexical coverage optimization and improved automatic letter-tosound conversion. Whereas morphological decomposition is a widely-studied domain in linguistics, our interest is limited here to identifying and processing the ...
متن کاملCombination of acoustic and lexical speaker adaptation for disordered speech recognition
This paper presents an approach to provide of lexical adaptation in Automatic Speech Recognition (ASR) of the disordered speech from a group of young impaired speakers. The outcome of an Acoustic Phonetic Decoder (APD) is used to learn new lexical variants of the 57-word vocabulary and add them to a lexicon personalized to each user. The possibilities of combination of this lexical adaptation w...
متن کاملHybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR
German is a highly inflected language with a large number of words derived from the same root. It makes use of a high degree of word compounding leading to high Out-of-vocabulary (OOV) rates, and Language Model (LM) perplexities. For such languages the use of sub-lexical units for Large Vocabulary Continuous Speech Recognition (LVCSR) becomes a natural choice. In this paper, we investigate the ...
متن کاملSubword-based Automatic Lexicon Learning for ASR
We present a framework for learning a pronunciation lexicon for an Automatic Speech Recognition (ASR) system from multiple utterances of the same training words, where the lexical identities of the words are unknown. Instead of only trying to learn pronunciations for known words we go one step further and try to learn both spelling and pronunciation in a joint optimization. Decoding based on li...
متن کامل