Nonlocal Language Modeling based on Context Co-occurrence Vectors

نویسندگان

  • Sadao Kurohashi
  • Manabu Ori
چکیده

This paper presents a novel nonlocal language model which utilizes contextual information. A reduced vector space model calculated from co-occurrences of word pairs provides word co-occurrence vectors. The sum of word cooccurrence vectors represents the context of a document, and the cosine similarity between the context vector and the word co-occurrence vectors represents the long-distance lexical dependencies. Experiments on the Mainichi Newspaper corpus show signi cant improvement in perplexity (5.0% overall and 27.2% on target vocabulary)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CASE-QA: Context and Syntax embeddings for Question Answering On Stack Overflow

Question answering (QA) systems rely on both knowledge bases and unstructured text corpora. Domain-specific QA presents a unique challenge, since relevant knowledge bases are often lacking and unstructured text is difficult to query and parse. This project focuses on the QUASAR-S dataset (Dhingra et al., 2017) constructed from the community QA site Stack Overflow. QUASAR-S consists of Cloze-sty...

متن کامل

AI-KU: Using Substitute Vectors and Co-Occurrence Modeling For Word Sense Induction and Disambiguation

Word sense induction aims to discover different senses of a word from a corpus by using unsupervised learning approaches. Once a sense inventory is obtained for an ambiguous word, word sense discrimination approaches choose the best-fitting single sense for a given context from the induced sense inventory. However, there may not be a clear distinction between one sense and another, although for...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Exploring the Validity of Corpus-derived Measures of Semantic Similarity

Lexical co-occurrence counts from large corpora have been used to construct highdimensional vector-space models of language. In this type of model words are represented as vectors (or points) in a hyperspace, and distances between word vectors are generally considered to reflect semantic similarity. Two issues must be addressed if a vector-space model is to be used as a 'semantic' measuring dev...

متن کامل

A Study on Word Similarity using Context Vector Models

There is a need to measure word similarity when processing natural languages, especially when using generalization, classification, or example -based approaches. Usually, measures of similarity between two words are defined according to the distance between their semantic classes in a semantic taxonomy . The taxonomy approaches are more or less semantic -based that do not consider syntactic sim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000