Discriminative n-gram language modeling for Turkish

نویسندگان

Ebru Arisoy

Brian Roark

Izhak Shafran

Murat Saraclar

چکیده

In this paper Discriminative Language Models (DLMs) are applied to the Turkish Broadcast News transcription task. Turkish presents a challenge to Automatic Speech Recognition (ASR) systems due to its rich morphology. Therefore, in addition to word n-gram features, morphology based features like root n-grams and inflectional group n-grams are incorporated into DLMs in order to improve the language models. Various feature sets provide reductions in the word error rate (WER). Our best result is obtained with the inflectional group n-gram features. 1.0% absolute improvement is achieved over the baseline model and this improvement is statistically significant at p<0.001 as measured by the NIST MAPSSWE significance test.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of MT-based ASR confusion models for semi-supervised discriminative language modeling

Semi-supervised discriminative language modeling uses simulated N-best lists instead of real ASR outputs as its training examples. In this study we apply two techniques in which artificial examples are generated using a WFST and an MT system trained on pairs of reference text and ASR output. We compare the performance of these techniques with the structured prediction and ranking variants of th...

متن کامل

Large-scale Discriminative n-gram Language Models for Statistical Machine Translation

We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systematic experiments on several benchmark tests for Chinese to English translation using a hierarchica...

متن کامل

Discriminative training of language model classifiers

We show how discriminative training methods, namely the Maximum Mutual Information and Maximum Discrimination approach, can be adopted for the training of N-gram language models used as clas-siiers working on symbol strings. By estimating the model parameters according to a discriminative objective function instead of Maximum Likelihood, the emphasis is not put on the exact modeling of each cla...

متن کامل

A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling

Discriminative language modeling provides a mechanism for differentiating between competing word hypotheses, which are usually ignored in traditional maximum likelihood estimation of N-gram language models. Discriminative language modeling usually requires manual transcription which can be costly and slow to obtain. On the other hand, there are vast amount of untranscribed speech data on which ...

متن کامل

Minimum rank error training for language modeling

Discriminative training techniques have been successfully developed for many pattern recognition applications. In speech recognition, discriminative training aims to minimize the metric of word error rate. However, in an information retrieval system, the best performance should be achieved by maximizing the average precision. In this paper, we construct the discriminative n-gram language model ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Discriminative n-gram language modeling for Turkish

نویسندگان

چکیده

منابع مشابه

Investigation of MT-based ASR confusion models for semi-supervised discriminative language modeling

Large-scale Discriminative n-gram Language Models for Statistical Machine Translation

Discriminative training of language model classifiers

A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling

Minimum rank error training for language modeling

عنوان ژورنال:

اشتراک گذاری