Detecting uncertainty in biomedical literature: a simple disambiguation approach using sparse random indexing
نویسنده
چکیده
This paper presents a novel approach to the problem of hedge detection, which involves the identification of so-called hedge cues for labeling sentences as certain or uncertain. This is the classification problem for Task 1 of the CoNLL-2010 Shared Task, which focuses on hedging in biomedical literature. We here propose to view hedge detection as a simple disambiguation problem, restricted to words that have previously been observed as hedge cues. Applying an SVM classifier, the approach achieves the best published results so far for sentence-level uncertainty prediction on the Shared Task test data. We also show that the technique of random indexing can be successfully applied for compressing the dimensionality of the original feature space by several orders of magnitude, while at the same time yielding better classifier performance.
منابع مشابه
Predicting speculation: a simple disambiguation approach to hedge detection in biomedical literature
BACKGROUND This paper presents a novel approach to the problem of hedge detection, which involves identifying so-called hedge cues for labeling sentences as certain or uncertain. This is the classification problem for Task 1 of the CoNLL-2010 Shared Task, which focuses on hedging in the biomedical domain. We here propose to view hedge detection as a simple disambiguation problem, restricted to ...
متن کاملWord Sense Disambiguation Using Random Indexing
This paper presents the results of an experiment to apply a novel semantic representational formalism called Random Indexing for the supervised word sense disambiguation of English words. Random Indexing uses high-dimensional sparse vectors with random patterns modeling neural activation patterns in the brain to represent linguistic information. The presented learning and disambiguating method ...
متن کاملSense-Based Biomedical Indexing and Retrieval
This paper tackles the problem of term ambiguity, especially for biomedical literature. We propose and evaluate two methods of Word Sense Disambiguation (WSD) for biomedical terms and integrate them to a sense-based document indexing and retrieval framework. Ambiguous biomedical terms in documents and queries are disambiguated using the Medical Subject Headings (MeSH) thesaurus and semantically...
متن کاملUsing Latent Semantic Indexing as a Measure of Conceptual Association for Noun Compound Disambiguation
Noun compounds are a frequently occurring yet highly ambiguous construction in natural language; their interpretation relies on extra-syntactic information. Several statistical methods for compound disambiguation have been reported in the literature; however, a striking feature of all these approaches is that disambiguation relies on statistics derived from unambiguous compounds in training, me...
متن کاملDiscovering Word Senses from Text Using Random Indexing
Random Indexing is a novel technique for dimensionality reduction while creating Word Space model from a given text. This paper explores the possible application of Random Indexing in discovering word senses from the text. The words appearing in the text are plotted onto a multi-dimensional Word Space using Random Indexing. The geometric distance between words is used as an indicative of their ...
متن کامل