Finding the Optimal Number of Clusters for Word Sense Disambiguation
نویسندگان
چکیده
Ambiguity is an inherent problem for many tasks in Natural Language Processing. Unsupervised and semi-supervised approaches to ambiguity resolution are appealing as they lower the cost of manual labour. Typically, those methods struggle with estimation of number of senses without supervision. This paper shows research on using stopping functions applied to clustering algorithms for estimation of number of senses. The experiments were performed for Polish and English. We found that estimation based on PK2 stopping functions is encouraging, but only when using coarse-grained distinctions between senses.
منابع مشابه
بررسی نقش انواع بافتار همنویسهها در تعیین شباهت بین مدارک
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...
متن کاملرفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA
Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...
متن کاملCluster Stopping Rules For Word Sense Discrimination
As text data becomes plentiful, unsupervised methods for Word Sense Disambiguation (WSD) become more viable. A problem encountered in applying WSD methods is finding the exact number of senses an ambiguity has in a training corpus collected in an automated manner. That number is not known a priori; rather it needs to be determined based on the data itself. We address that problem using cluster ...
متن کاملFinding optimal parameter settings for high performance word sense disambiguation
This article describes the four systems sent by the author to the SENSEVAL-3 contest, the English lexical sample task. The best recognition rate obtained by one of these systems was 72.9% (fine grain score) .
متن کاملAutomatic Sense Disambiguation for Target Word Selection
This paper describes a method of automatic sense disambiguation for target word selection in Korean to English machine translation. At first, we define the concept of cluster for each sense of given verb according to corresponding target word. And then, we propose a method which selects the sense combination of words as the correct sense that has the greatest number of overlaps between input ca...
متن کامل