Relieving the data Acquisition Bottleneck in Word Sense Disambiguation
نویسنده
چکیده
Supervised learning methods for WSD yield better performance than unsupervised methods. Yet the availability of clean training data for the former is still a severe challenge. In this paper, we present an unsupervised bootstrapping approach for WSD which exploits huge amounts of automatically generated noisy data for training within a supervised learning framework. The method is evaluated using the 29 nouns in the English Lexical Sample task of SENSEVAL2. Our algorithm does as well as supervised algorithms on 31% of this test set, which is an improvement of 11% (absolute) over state-of-the-art bootstrapping WSD algorithms. We identify seven different factors that impact the performance of our system.
منابع مشابه
Learning Semantic Classes for Word Sense Disambiguation
Word Sense Disambiguation suffers from a long-standing problem of knowledge acquisition bottleneck. Although state of the art supervised systems report good accuracies for selected words, they have not been shown to be promising in terms of scalability. In this paper, we present an approach for learning coarser and more general set of concepts from a sense tagged corpus in order to alleviate th...
متن کاملLearning Semantic Classes for Word Sense Disambiguation
Word Sense Disambiguation suffers from a long-standing problem of knowledge acquisition bottleneck. Although state of the art supervised systems report good accuracies for selected words, they have not been shown to be promising in terms of scalability. In this paper, we present an approach for learning coarser and more general set of concepts from a sense tagged corpus, in order to alleviate t...
متن کاملIterative Constrained Clustering for Subjectivity Word Sense Disambiguation
Subjectivity word sense disambiguation (SWSD) is a supervised and applicationspecific word sense disambiguation task disambiguating between subjective and objective senses of a word. Not surprisingly, SWSD suffers from the knowledge acquisition bottleneck. In this work, we use a “cluster and label” strategy to generate labeled data for SWSD semiautomatically. We define a new algorithm called It...
متن کاملKnowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguati...
متن کاملBypassing Knowledge Acquisition Bottleneck with Bayesian Word Sense Induction
We use Bayesian topic modeling techniques adapted to the task of unsupervised word sense induction on acronyms in clinical text and investigate (1) the amount of annotated data needed by such approaches to match the performance of the supervised sense disambiguation systems, and (2) feasibility of using an automatically collected silver standard for such techniques. A dataset of ambiguous abbre...
متن کامل