On-line learning of acoustic and lexical units for domain-independent ASR
نویسنده
چکیده
We are interested in on-line acquisition of acoustic, lexical and semantic units from spontaneous speech. Traditional ASR techniques require the domain-speci c knowledge of acoustic, lexicon data and more importantly the word probability distributions. In this paper we propose an algorithm for unsupervised learning of acoustic and lexical units from out-of-domain speech data. The new lexical units are used for fast adaptation of language model probabilities to a new domain. We show that starting from the Switchboard corpus ( lexicon and language model ) we learn the most relevant language statistics of the "How May I Help?" task.
منابع مشابه
Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model
One of the key challenges involved in building statistical automatic speech recognition (ASR) systems is modeling the relationship between subword units or “lexical units” and acoustic feature observations. To model this relationship two types of resources are needed, namely, acoustic resources i.e., speech data with word level transcriptions and lexical resources where each word is transcribed...
متن کاملGrapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling
Automatic speech recognition (ASR) systems incorporate expert knowledge of language or the linguistic expertise through the use of phone pronunciation lexicon (or dictionary) where each word is associated with a sequence of phones. The creation of phone pronunciation lexicon for a new language or domain is costly as it requires linguistic expertise, and includes time and money. In this thesis, ...
متن کاملOn recognition of non-native speech using probabilistic lexical model
Despite various advances in automatic speech recognition (ASR) technology, recognition of speech uttered by non-native speakers is still a challenging problem. In this paper, we investigate the role of different factors such as type of lexical model and choice of acoustic units in recognition of speech uttered by non-native speakers. More precisely, we investigate the influence of the probabili...
متن کاملBayesian Learning of a Language Model from Continuous Speech
We propose a novel scheme to learn a language model (LM) for automatic speech recognition (ASR) directly from continuous speech. In the proposed method, we first generate phoneme lattices using an acoustic model with no linguistic constraints, then perform training over these phoneme lattices, simultaneously learning both lexical units and an LM. As a statistical framework for this learning pro...
متن کاملImproving grapheme-based ASR by probabilistic lexical modeling approach
There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the relationship between acoustic feature observations and grapheme states may not be always trivial. It usual...
متن کامل