based language models

Distribution-Based Pruning of Backoff Language Models

2000

Jianfeng Gao Kai-Fu Lee

We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performe...

متن کامل

Entropy-based Pruning of Backoff Language Models

Journal: :CoRR 1998

Andreas Stolcke

A criterion for pruning parameters from N-gram backoff language models is developed, based on the relative entropy between the original and the pruned model. It is shown that the relative entropy resulting from pruning a single N-gram can be computed exactly and efficiently for backoff models. The relative entropy measure can be expressed as a relative change in training set perplexity. This le...

متن کامل

Phrase-based language models for speech recognition

1999

Hong-Kwang Jeff Kuo Wolfgang Reichl

Including phrases in the vocabulary list can improve ngram language models used in speech recognition. In this paper, we report results of automatic extraction of phrases from the training text using frequency, likelihood, and correlation criteria. We show how a language model built from a vocabulary that includes useful phrases can systematically improve language model perplexity in a natural ...

متن کامل

Programme for " Exemplar Based Models of Language

2007

Ingo Plag Remko Scha Neal Snider Jean-Pierre Nadal Dave Cochran David Cochran

ions and the retention of exemplars of which those abstractions are composed. The work by Batali (2002), who investigates the emergence of semantic structures in language acquisition and evolution, can also be viewed in

متن کامل

Language Models Based on Semantic Composition

2009

Jeff Mitchell Mirella Lapata

In this paper we propose a novel statistical language model to capture long-range semantic dependencies. Specifically, we apply the concept of semantic composition to the problem of constructing predictive history representations for upcoming words. We also examine the influence of the underlying semantic space on the composition task by comparing spatial semantic representations against topic-...

متن کامل

Stream-based Randomised Language Models for SMT

2009

Abby D. Levenberg Miles Osborne

Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream...

متن کامل

Topic-based language models using EM

1999

Daniel Gildea Thomas Hofmann

In this paper, we propose a novel statistical language model to capture topic-related long-range dependencies. Topics are modeled in a latent variable framework in which we also derive an EM algorithm to perform a topic factor decomposition based on a segmented training corpus. The topic model is combined with a standard language model to be used for on-line word prediction. Perplexity results ...

متن کامل

Decoding with shrinkage-based language models

2010

Ahmad Emami Stanley F. Chen Abraham Ittycheriah Hagen Soltau Bing Zhao

In this paper, we investigate the use of a class-based exponential language model when directly integrated into speech recognition or machine translation decoders. Recently, a novel class-based language model, Model M, was introduced and was shown to outperform regular n-gram models on moderate amounts of Wall Street Journal data. This model was motivated by the observation that shrinking the s...

متن کامل

Category-based Statistical Language Models Synopsis

1997

Thomas Niesler

Language models are computational techniques and structures that describe word sequences produced by human subjects, and the work presented here considers primarily their application to automatic speech-recognition systems. Due to the very complex nature of natural languages as well as the need for robust recognition, statistically-based language models, which assign probabilities to word seque...

متن کامل

تاثیر استفاده از portfolio بعنوان شیوه ارزشیابی بر میزان حس استقلال طلبی زبان آموزان ایرانی در سطح پیش متوسطه

پایان نامه :دانشگاه آزاد اسلامی - دانشگاه آزاد اسلامی واحد تهران مرکزی - دانشکده ادبیات و زبانهای خارجی 1390

توران بسطامی, حسن سودمند افشار, شعله کلاهی,

abstract the current study set out to address the issue as to whether the implementation of portfolio assessment would give rise to iranian pre-intermediate efl learner autonomy. participants comprised 60 female in pre-intermediate level within the age range of 16-28.they were selected from among 90 language learners based on their scores on language proficiency test -key english test. then, t...

15 صفحه اول