Relieving the data Acquisition Bottleneck in Word Sense Disambiguation

نویسنده

  • Mona T. Diab
چکیده

Supervised learning methods for WSD yield better performance than unsupervised methods. Yet the availability of clean training data for the former is still a severe challenge. In this paper, we present an unsupervised bootstrapping approach for WSD which exploits huge amounts of automatically generated noisy data for training within a supervised learning framework. The method is evaluated using the 29 nouns in the English Lexical Sample task of SENSEVAL2. Our algorithm does as well as supervised algorithms on 31% of this test set, which is an improvement of 11% (absolute) over state-of-the-art bootstrapping WSD algorithms. We identify seven different factors that impact the performance of our system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Semantic Classes for Word Sense Disambiguation

Word Sense Disambiguation suffers from a long-standing problem of knowledge acquisition bottleneck. Although state of the art supervised systems report good accuracies for selected words, they have not been shown to be promising in terms of scalability. In this paper, we present an approach for learning coarser and more general set of concepts from a sense tagged corpus in order to alleviate th...

متن کامل

Learning Semantic Classes for Word Sense Disambiguation

Word Sense Disambiguation suffers from a long-standing problem of knowledge acquisition bottleneck. Although state of the art supervised systems report good accuracies for selected words, they have not been shown to be promising in terms of scalability. In this paper, we present an approach for learning coarser and more general set of concepts from a sense tagged corpus, in order to alleviate t...

متن کامل

Iterative Constrained Clustering for Subjectivity Word Sense Disambiguation

Subjectivity word sense disambiguation (SWSD) is a supervised and applicationspecific word sense disambiguation task disambiguating between subjective and objective senses of a word. Not surprisingly, SWSD suffers from the knowledge acquisition bottleneck. In this work, we use a “cluster and label” strategy to generate labeled data for SWSD semiautomatically. We define a new algorithm called It...

متن کامل

Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguati...

متن کامل

Bypassing Knowledge Acquisition Bottleneck with Bayesian Word Sense Induction

We use Bayesian topic modeling techniques adapted to the task of unsupervised word sense induction on acronyms in clinical text and investigate (1) the amount of annotated data needed by such approaches to match the performance of the supervised sense disambiguation systems, and (2) feasibility of using an automatically collected silver standard for such techniques. A dataset of ambiguous abbre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004