Discriminative n-gram language modeling for Turkish
نویسندگان
چکیده
In this paper Discriminative Language Models (DLMs) are applied to the Turkish Broadcast News transcription task. Turkish presents a challenge to Automatic Speech Recognition (ASR) systems due to its rich morphology. Therefore, in addition to word n-gram features, morphology based features like root n-grams and inflectional group n-grams are incorporated into DLMs in order to improve the language models. Various feature sets provide reductions in the word error rate (WER). Our best result is obtained with the inflectional group n-gram features. 1.0% absolute improvement is achieved over the baseline model and this improvement is statistically significant at p<0.001 as measured by the NIST MAPSSWE significance test.
منابع مشابه
Investigation of MT-based ASR confusion models for semi-supervised discriminative language modeling
Semi-supervised discriminative language modeling uses simulated N-best lists instead of real ASR outputs as its training examples. In this study we apply two techniques in which artificial examples are generated using a WFST and an MT system trained on pairs of reference text and ASR output. We compare the performance of these techniques with the structured prediction and ranking variants of th...
متن کاملLarge-scale Discriminative n-gram Language Models for Statistical Machine Translation
We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systematic experiments on several benchmark tests for Chinese to English translation using a hierarchica...
متن کاملDiscriminative training of language model classifiers
We show how discriminative training methods, namely the Maximum Mutual Information and Maximum Discrimination approach, can be adopted for the training of N-gram language models used as clas-siiers working on symbol strings. By estimating the model parameters according to a discriminative objective function instead of Maximum Likelihood, the emphasis is not put on the exact modeling of each cla...
متن کاملA Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling
Discriminative language modeling provides a mechanism for differentiating between competing word hypotheses, which are usually ignored in traditional maximum likelihood estimation of N-gram language models. Discriminative language modeling usually requires manual transcription which can be costly and slow to obtain. On the other hand, there are vast amount of untranscribed speech data on which ...
متن کاملMinimum rank error training for language modeling
Discriminative training techniques have been successfully developed for many pattern recognition applications. In speech recognition, discriminative training aims to minimize the metric of word error rate. However, in an information retrieval system, the best performance should be achieved by maximizing the average precision. In this paper, we construct the discriminative n-gram language model ...
متن کامل