Combination of word-based and category-based language models

نویسندگان

  • Thomas Niesler
  • Philip C. Woodland
چکیده

A language model combining word-based and category-based ngrams within a backoff framework is presented. Word n-grams conveniently capture sequential relations between particular words, while the category-model, which is based on part-of-speech classifications and allows ambiguous category membership, is able to generalise to unseen word sequences and therefore appropriate in backoff situations. Experiments on the LOB, Switchboard and WSJ0 corpora demonstrate that the technique greatly improves language model perplexities for sparse training sets, and offers significantly improved complexity versus performance tradeoffs when compared with standard trigram models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Combination of words and word categories in varigram histories

This paper presents a new kmd of language models: caregor@vord varigrums. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categones may be employed to descnbe a given word history. This provides a much greater flexibtlity than previous combinations of word-based and category-based language models. ...

متن کامل

پارس مورف: تحلیلگر صرفی زبان فارسی

In this paper, the theoretical foundation, the way of implementation and the uses of Pars Morph, a Persian morphological analyzer is introduced. Pars Morph is a rule-based Persian morphological analysis system, which analyzes the internal structure of word in Persian and determines the grammatical category and function of the word parts. Pars Morph being in link with a lexicon covering about 45...

متن کامل

Terminology of Combining the Sentences of Farsi Language with the Viterbi Algorithm and BI-GRAM Labeling

This paper, based on the Viterbi algorithm, selects the most likely combination of different wording from a variety of scenarios. In this regard, the Bi-gram and Unigram tags of each word, based on the letters forming the words, as well as the bigram and unigram labels After the breakdown into the composition or moment of transition from the decomposition to the combination obtained from th...

متن کامل

Category-based Language Models in a Spanish Spoken Dialogue System

The main goal of this work is to study if a language model based on categories could improve the performance of a dialogue system application as it does when not spontaneous and bigger English corpora are used. Firstly, several sets of categories, which are generated on the basis of different classification criteria, are obtained. Then, for each criterion, two language models are generated: A l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996