Speech recognition without a lexicon - bridging the gap between graphemic and phonetic systems

نویسندگان

  • David F. Harwath
  • James R. Glass
چکیده

Modern speech recognizers rely on three core components: an acoustic model, a language model, and a pronunciation lexicon. In order to expand speech recognition capability to lowresource languages and domains, techniques to peel away the expert knowledge required to craft these three components have been growing in popularity. In this paper, we present a method for automatically learning a weighted pronunciation lexicon in a data-driven fashion without assuming the existence of any phonetic lexicon whatsoever. Given an initial grapheme acoustic model, our method utilizes a novel technique for semiconstrained acoustic unit decoding, which is used to help train a letter to sound (L2S) model. The L2S model is then used in conjunction with a Pronunciation Mixture Model (PMM) to infer a pronunciation lexicon. We evaluate our method on English as well as Lao and Haitian, two low-resource languages featured in the IARPA Babel program.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grapheme based speech recognition for large vocabularies

Common speech recognition systems use phonetically motivated subword units. To utilize words in these systems, one has to translate the available graphemic word representation into a phonetic one. To reduce this manual effort we propose to build grapheme based recognition systems. They can be used as speech interfaces for devices that can provide a graphemic representation of words like city na...

متن کامل

Multilingual non-native speech recognition using phonetic confusion-based acoustic model modification and graphemic constraints

In this paper we present an automated approach for non-native speech recognition. We introduce a new phonetic confusion concept that associates sequences of native language (NL) phones to spoken language (SL) phones. Phonetic confusion rules are automatically extracted from a non-native speech database for a given NL and SL using both NL’s and SL’s ASR systems. These rules are used to modify th...

متن کامل

Use of Graphemic Lexicons for Spoken Language Assessment

Automatic systems for practice and exams are essential to support the growing worldwide demand for learning English as an additional language. Assessment of spontaneous spoken English is, however, currently limited in scope due to the difficulty of achieving sufficient automatic speech recognition (ASR) accuracy. ”Off-the-shelf” English ASR systems cannot model the exceptionally wide variety of...

متن کامل

Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem

There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, whic...

متن کامل

Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems

This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) Speech-to-Text (STT) system. It is shown that wordboundary context markers provide a powerful method to enhance graphemic systems by implicit phonetic information, improving the modelling capability of graphemic systems. In addition, a robust technique for full covariance Gaus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014