One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction
نویسندگان
چکیده
Supervised word sense disambiguation (WSD) systems are usually the best performing systems when evaluated on standard benchmarks. However, these systems need annotated training data to function properly. While there are some publicly available open source WSD systems, very few large annotated datasets are available to the research community. The two main goals of this paper are to extract and annotate a large number of samples and release them for public use, and also to evaluate this dataset against some word sense disambiguation and induction tasks. We show that the open source IMS WSD system trained on our dataset achieves stateof-the-art results in standard disambiguation tasks and a recent word sense induction task, outperforming several task submissions and strong baselines.
منابع مشابه
Discriminating Among Word Meanings by Identifying Similar Contexts
Word sense discrimination is an unsupervised clustering problem, which seeks to discover which instances of a word/s are used in the same meaning. This is done strictly based on information found in raw corpora, without using any sense tagged text or other existing knowledge sources. Our particular focus is to systematically compare the efficacy of a range of lexical features, context represent...
متن کاملInducing Sense-Discriminating Context Patterns from Sense-Tagged Corpora
Traditionally, context features used in word sense disambiguation are based on collocation statistics and use only minimal syntactic and semantic information. Corpus Pattern Analysis is a technique for producing knowledge-rich context features that capture sense distinctions. It involves (1) identifying sense-carrying context patterns and (2) using the derived context features to discriminate b...
متن کاملPartially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora
Supervised and semi-supervised sense disambiguation methods will mis-tag the instances of a target word if the senses of these instances are not defined in sense inventories or there are no tagged instances for these senses in training data. Here we used a model order identification method to avoid the misclassification of the instances with undefined senses by discovering new senses from mixed...
متن کاملNoun Sense Induction and Disambiguation using Graph-Based Distributional Semantics
We introduce an approach to word sense induction and disambiguation. The method is unsupervised and knowledge-free: sense representations are learned from distributional evidence and subsequently used to disambiguate word instances in context. These sense representations are obtained by clustering dependency-based secondorder similarity networks. We then add features for disambiguation from het...
متن کاملSemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses
Most work on word sense disambiguation has assumed that word usages are best labeled with a single sense. However, contextual ambiguity or fine-grained senses can potentially enable multiple sense interpretations of a usage. We present a new SemEval task for evaluating Word Sense Induction and Disambiguation systems in a setting where instances may be labeled with multiple senses, weighted by t...
متن کامل