Graphemes as basic units for crosslingual speech recognition

نویسندگان

  • Andrej Žgank
  • Zdravko Kačič
  • Frank Diehl
  • Jozef Juhar
  • Slavomir Lihan
  • Klara Vicsi
  • Gyorgy Szaszak
چکیده

This paper presents our work on grapheme based crosslingual speech recognition carried out within the MASPER initiative. The performance of monolingual grapheme based acoustic models is compared to the performance of monolingual acoustic models based on phonemes. The transfer between source and target language was done using an expert knowledge approach. For the experiments, German, Spanish, Hungarian and Slovak served as source languages wheras Slovenian was the target language. All experiments are based on SpeechDat databases. The results achieved by the use of grapheme based acoustic models are comparable to the ones achieved by phoneme based acoustic models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Korean large vocabulary continuous speech recognition using pseudomorpheme units

This paper presents a Korean large vocabulary continuous speech recognition system based on pseudomorpheme units. In Korean, an eojeol (word phrase) is a unit for spacing and a morpheme is the smallest unit with semantic meaning. If the eojeol is used as the dictionary and language modeling unit, the number of the unit becomes enormous. Instead we propose to use modified morpheme or pseudomorph...

متن کامل

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

Standard hidden Markov model (HMM) based automatic speech recognition (ASR) systems usually use cepstral features as acoustic observation and phonemes as subword units. Speech signal exhibits wide range of variability such as, due to environmental variation, speaker variation. This leads to different kinds of mismatch, such as, mismatch between acoustic features and acoustic models or mismatch ...

متن کامل

Improving grapheme-based ASR by probabilistic lexical modeling approach

There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the relationship between acoustic feature observations and grapheme states may not be always trivial. It usual...

متن کامل

Towards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models

State-of-the-art automatic speech recognition and text-to-speech systems are based on subword units, typically phonemes. This necessitates a lexicon that maps each word to a sequence of subword units. Development of a phonetic lexicon for a language requires linguistic knowledge as well as human effort, which may not be always readily available, particularly for under-resourced languages. In su...

متن کامل

Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling

Automatic speech recognition (ASR) systems incorporate expert knowledge of language or the linguistic expertise through the use of phone pronunciation lexicon (or dictionary) where each word is associated with a sequence of phones. The creation of phone pronunciation lexicon for a new language or domain is costly as it requires linguistic expertise, and includes time and money. In this thesis, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005