HedgeHunter: A System for Hedge Detection and Uncertainty Classification
نویسنده
چکیده
With the dramatic growth of scientific publishing, Information Extraction (IE) systems are becoming an increasingly important tool for large scale data analysis. Hedge detection and uncertainty classification are important components of a high precision IE system. This paper describes a two part supervised system which classifies words as hedge or nonhedged and sentences as certain or uncertain in biomedical and Wikipedia data. In the first stage, our system trains a logistic regression classifier to detect hedges based on lexical and Part-of-Speech collocation features. In the second stage, we use the output of the hedge classifier to generate sentence level features based on the number of hedge cues, the identity of hedge cues, and a Bag-of-Words feature vector to train a logistic regression classifier for sentence level uncertainty. With the resulting classification, an IE system can then discard facts and relations extracted from these sentences or treat them as appropriately doubtful. We present results for in domain training and testing and cross domain training and testing based on a simple union of training sets.
منابع مشابه
A hybridization of evolutionary fuzzy systems and ant Colony optimization for intrusion detection
A hybrid approach for intrusion detection in computer networks is presented in this paper. The proposed approach combines an evolutionary-based fuzzy system with an Ant Colony Optimization procedure to generate high-quality fuzzy-classification rules. We applied our hybrid learning approach to network security and validated it using the DARPA KDD-Cup99 benchmark data set. The results indicate t...
متن کاملKernel-Based Logical and Relational Learning with kLog for Hedge Cue Detection
Hedge cue detection is a Natural Language Processing (NLP) task that consists of determining whether sentences contain unreliable or uncertain information. This binary classification problem, i.e. distinguishing factual versus uncertain sentences, only recently received attention in the NLP community. We use kLog, a new logical and relational language for kernel-based learning, to tackle this p...
متن کاملSpeculation and negation annotation in natural language texts: what the case of BioScope might (not) reveal
In information extraction, it is of key importance to distinguish between facts and uncertain or negated information. In other words, IE applications have to treat sentences / clauses containing uncertain or negated information differently from factual information that is why the development of hedge and negation detection systems has received much interest – e.g. the objective of the CoNLL2010...
متن کاملDetecting uncertainty in biomedical literature: a simple disambiguation approach using sparse random indexing
This paper presents a novel approach to the problem of hedge detection, which involves the identification of so-called hedge cues for labeling sentences as certain or uncertain. This is the classification problem for Task 1 of the CoNLL-2010 Shared Task, which focuses on hedging in biomedical literature. We here propose to view hedge detection as a simple disambiguation problem, restricted to w...
متن کاملA Hedgehop over a Max-Margin Framework Using Hedge Cues
In this paper, we describe the experimental settings we adopted in the context of the 2010 CoNLL shared task for detecting sentences containing uncertainty. The classification results reported on are obtained using discriminative learning with features essentially incorporating lexical information. Hyper-parameters are tuned for each domain: using BioScope training data for the biomedical domai...
متن کامل