Automatic Language Identification Using Phoneme and Automatically Derived Unit Strings

نویسندگان

  • Pavel Matejka
  • Igor Szöke
  • Petr Schwarz
  • Jan Cernocký
چکیده

Language identification (LID) based on phono-tactic modeling is presented in this paper. Approaches using phoneme strings and strings of units automatically derived by an Ergodic HMM (EHMM) are compared. The phoneme recognizers were trained on 6 languages from OGI multi-language-corpus and Czech SpeechDat-E. The LID results are obtained on 4 languages. The results show superiority of Czech phoneme recognizer while used in LID and promising trends using the EHMMderived units.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Chinese Frequent Strings Without a Dictionary From a Chinese Corpus and its Applications

This paper describes how to extract Chinese frequent strings without using a dictionary. In this paper, we generalize the notations of words and unknown words to those of frequent strings. The Chinese frequent strings (CFSs) we define include words, unknown words, and other strings that are frequently used. Some examples of CFSs are “ (can only let)”, “ (every minute and every second)”, “ (bear...

متن کامل

Towards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment

In this paper we tackle the task of bootstrapping an Automatic Speech Recognition system without an a priori given language model, a pronunciation dictionary, or transcribed speech data for the target language Slovene – only untranscribed speech and translations to other resource-rich source languages of what was said are available. Therefore, our approach is highly relevant for under-resourced...

متن کامل

New variant of the Self Organizing Map in Pulsed Neural Networks to Improve Phoneme Recognition in Continuous Speech

Speech recognition has gradually improved over the years, phoneme recognition in particular. Phoneme recognition plays very important role in speech processing. Phoneme strings are basic representation for automatic language recognition and it is proved that language recognition results are highly correlated with phoneme recognition results. Nowadays, many recognizers are based on Artificial ne...

متن کامل

Theoretical error prediction for a language identification system using optimal phoneme clustering

using Optimal Phoneme Clustering Kay M. Berkling, Etienne Barnard (berkling,barnard)@cse.ogi.edu Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology Abstract A neural network based language identi cation system is described, which uses language independent phoneme clusters as speech units to recognize the language spoken by native speakers over the tele...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004