Raw Corpus Word Sense Disambiguation

نویسنده

  • Ted Pedersen
چکیده

A wide range of approaches have been applied to word sense disambiguation. However, most require manually crafted knowledge such as annotated text, machine readable dictionaries or thesari, semantic networks, or aligned bilingual corpora. The reliance on these knowledge sources limits portability since they generally exist only for selected domains and languages. This poster presents a corpus-based approach where multiple usages of an ambiguous word are divided into a specified number of sense groups based strictly on features that are automatically obtained from the immediately surrounding raw text. We are given N sentences, each of which contains a usage of a particular ambiguous word. Each sentence is converted into a feature vector (F1, F2,...,Fn, S) where (F1,...,Fn) represent the observed contextual properties of the sentence and S represents the unobserved sense of the ambiguous word. A probabilistic model is built from this data. First, a parametric form that describes the interactions among the observed contextual features and the unknown sense is specified. We use the form commonly known as Naive Bayes due to its favorable performance in previous studies of supervised disambiguation (e.g., Gale et. al., 1992, Mooney, 1996, Ng 1997). The Naive Bayes model, when applied to disambiguation, implies that all contextual features are conditionally independent given the sense of the ambiguous word: n

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Iterative Approach for Unsupervised Most Frequent Sense Detection using WordNet and Word Embeddings

Given a word, what is the most frequent sense in which it occurs in a given corpus? Most Frequent Sense (MFS) is a strong baseline for unsupervised word sense disambiguation. If we have large amounts of sense-annotated corpora, MFS can be trivially created. However, senseannotated corpora are a rarity. In this paper, we propose a method which can compute MFS from raw corpora. Our approach itera...

متن کامل

Kannada Word Sense Disambiguation for Machine Translation

Polysemous Words can have more than one distinct meaning. Word sense disambiguation (WSD) is the ability to identify the exact meaning of such polysemous words in context in a computational manner. WSD is considered as an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problem in Artificial Intelligence. In this paper, we propose an Integrated Kanna...

متن کامل

The Role of Non-Ambiguous Words in Natural Language Disambiguation

This paper describes an unsupervised approach for natural language disambiguation, applicable to ambiguity problems where classes of equivalence can be defined over the set of words in a lexicon. Lexical knowledge is induced from non-ambiguous words via classes of equivalence, and enables the automatic generation of annotated corpora. The only requirements are a lexicon and a raw textual corpus...

متن کامل

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

Automatically Building a Lexicon from Raw Noisy Data in a Closed Domain

Natural language that people use in electronic communication is far from perfect, due to the narrow channel. This also applies to electronic negotiation. We analyze characteristics of the language data obtained from electronic negotiation. We introduce a novel procedure for extracting and building a lexicon from raw noisy data. The data belong to a closed domain, which allows us to perform doma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998