Addressing the MFS Bias in WSD systems
نویسندگان
چکیده
Word Sense Disambiguation (WSD) systems tend to have a strong bias towards assigning the Most Frequent Sense (MFS), which results in high performance on the MFS but in a very low performance on the less frequent senses. We addressed the MFS bias in WSD systems by combining the output from a WSD system with a set of mostly static features to create a MFS classifier to decide when to and not to choose the MFS. The output from this MFS classifier, which is based on the Random Forest algorithm, is then used to modify the output from the original WSD system. We applied our classifier to one of the state-of-the-art supervised WSD systems, i.e. IMS, and to of the best state-of-the-art unsupervised WSD systems, i.e. UKB. Our main finding is that we are able to improve the system output in terms of choosing between the MFS and the less frequent senses. When we apply the MFS classifier to fine-grained WSD, we observe an improvement on the less frequent sense cases, whereas we maintain the overall recall.
منابع مشابه
Unsupervised Most Frequent Sense Detection using Word Embeddings
An acid test for any new Word Sense Disambiguation (WSD) algorithm is its performance against the Most Frequent Sense (MFS). The field of WSD has found the MFS baseline very hard to beat. Clearly, if WSD researchers had access to MFS values, their striving to better this heuristic will push the WSD frontier. However, getting MFS values requires sense annotated corpus in enormous amounts, which ...
متن کاملDetecting Most Frequent Sense using Word Embeddings and BabelNet
Since the inception of the SENSEVAL evaluation exercises there has been a great deal of recent research into Word Sense Disambiguation (WSD). Over the years, various supervised, unsupervised and knowledge based WSD systems have been proposed. Beating the first sense heuristics is a challenging task for these systems. In this paper, we present our work on Most Frequent Sense (MFS) detection usin...
متن کاملUnsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias
This paper explores the large-scale acquisition of sense-tagged examples for Word Sense Disambiguation (WSD). We have applied the “WordNet monosemous relatives” method to construct automatically a web corpus that we have used to train disambiguation systems. The corpus-building process has highlighted important factors, such as the distribution of senses (bias). The corpus has been used to trai...
متن کاملDAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation
We introduce Peripheral Diversity (PD) as a knowledge-based approach to achieve multilingual Word Sense Disambiguation (WSD). PD exploits the frequency and diverse use of word senses in semantic subgraphs derived from larger sense inventories such as BabelNet, Wikipedia, and WordNet in order to achieve WSD. PD’s f -measure scores for SemEval 2013 Task 12 outperform the Most Frequent Sense (MFS)...
متن کاملSelection bias and its implications for case-control studies: a case study of magnetic field exposure and childhood leukaemia.
Based on the epidemiological association between residential exposure to extremely low frequency-magnetic fields (ELF-MF) and childhood leukaemia, the International Agency for Research on Cancer classified ELF-MF as a possible human carcinogen. Since clear supportive laboratory evidence is lacking and biophysical plausibility of carcinogenicity of MFs is questioned, a causal relationship betwee...
متن کامل