نتایج جستجو برای: based language models

تعداد نتایج: 3803296  

2000
Jianfeng Gao Kai-Fu Lee

We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performe...

Journal: :CoRR 1998
Andreas Stolcke

A criterion for pruning parameters from N-gram backoff language models is developed, based on the relative entropy between the original and the pruned model. It is shown that the relative entropy resulting from pruning a single N-gram can be computed exactly and efficiently for backoff models. The relative entropy measure can be expressed as a relative change in training set perplexity. This le...

1999
Hong-Kwang Jeff Kuo Wolfgang Reichl

Including phrases in the vocabulary list can improve ngram language models used in speech recognition. In this paper, we report results of automatic extraction of phrases from the training text using frequency, likelihood, and correlation criteria. We show how a language model built from a vocabulary that includes useful phrases can systematically improve language model perplexity in a natural ...

2007
Ingo Plag Remko Scha Neal Snider Jean-Pierre Nadal Dave Cochran David Cochran

ions and the retention of exemplars of which those abstractions are composed. The work by Batali (2002), who investigates the emergence of semantic structures in language acquisition and evolution, can also be viewed in

2009
Jeff Mitchell Mirella Lapata

In this paper we propose a novel statistical language model to capture long-range semantic dependencies. Specifically, we apply the concept of semantic composition to the problem of constructing predictive history representations for upcoming words. We also examine the influence of the underlying semantic space on the composition task by comparing spatial semantic representations against topic-...

2009
Abby D. Levenberg Miles Osborne

Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream...

1999
Daniel Gildea Thomas Hofmann

In this paper, we propose a novel statistical language model to capture topic-related long-range dependencies. Topics are modeled in a latent variable framework in which we also derive an EM algorithm to perform a topic factor decomposition based on a segmented training corpus. The topic model is combined with a standard language model to be used for on-line word prediction. Perplexity results ...

2010
Ahmad Emami Stanley F. Chen Abraham Ittycheriah Hagen Soltau Bing Zhao

In this paper, we investigate the use of a class-based exponential language model when directly integrated into speech recognition or machine translation decoders. Recently, a novel class-based language model, Model M, was introduced and was shown to outperform regular n-gram models on moderate amounts of Wall Street Journal data. This model was motivated by the observation that shrinking the s...

1997
Thomas Niesler

Language models are computational techniques and structures that describe word sequences produced by human subjects, and the work presented here considers primarily their application to automatic speech-recognition systems. Due to the very complex nature of natural languages as well as the need for robust recognition, statistically-based language models, which assign probabilities to word seque...

پایان نامه :دانشگاه آزاد اسلامی - دانشگاه آزاد اسلامی واحد تهران مرکزی - دانشکده ادبیات و زبانهای خارجی 1390

abstract the current study set out to address the issue as to whether the implementation of portfolio assessment would give rise to iranian pre-intermediate efl learner autonomy. participants comprised 60 female in pre-intermediate level within the age range of 16-28.they were selected from among 90 language learners based on their scores on language proficiency test -key english test. then, t...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید