A Lucene and Maximum Entropy Model Based Hedge Detection System
نویسندگان
چکیده
This paper describes the approach to hedge detection we developed, in order to participate in the shared task at CoNLL 2010. A supervised learning approach is employed in our implementation. Hedge cue annotations in the training data are used as the seed to build a reliable hedge cue set. Maximum Entropy (MaxEnt) model is used as the learning technique to determine uncertainty. By making use of Apache Lucene, we are able to do fuzzy string match to extract hedge cues, and to incorporate part-of-speech (POS) tags in hedge cues. Not only can our system determine the certainty of the sentence, but is also able to find all the contained hedges. Our system was ranked third on the Wikipedia dataset. In later experiments with different parameters, we further improved our results, with a 0.612 F-score on the Wikipedia dataset, and a 0.802 F-score on the biological dataset.
منابع مشابه
Entropy Based Fuzzy Rule Weighting for Hierarchical Intrusion Detection
Predicting different behaviors in computer networks is the subject of many data mining researches. Providing a balanced Intrusion Detection System (IDS) that directly addresses the trade-off between the ability to detect new attack types and providing low false detection rate is a fundamental challenge. Many of the proposed methods perform well in one of the two aspects, and concentrate on a su...
متن کاملResolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules
This paper describes a hybrid, two-level approach for resolving hedge cues, the problem of the CoNLL 2010 shared task. First, a maximum entropy classifier is applied to identify cue words, using both syntacticand surface-oriented features. Second, a set of manually crafted rules, operating on dependency representations and the output of the classifier, is applied to resolve the scope of the hed...
متن کاملHedge Detection Using the RelHunter Approach
RelHunter is a Machine Learning based method for the extraction of structured information from text. Here, we apply RelHunter to the Hedge Detection task, proposed as the CoNLL-2010 Shared Task1. RelHunter’s key design idea is to model the target structures as a relation over entities. The method decomposes the original task into three subtasks: (i) Entity Identification; (ii) Candidate Relatio...
متن کاملMaximum Entropy Analysis for G/G/1 Queuing System (TECHNICAL NOTE)
This paper provides steady state queue-size distribution for a G/G/1 queue by using principle of maximum entropy. For this purpose we have used average queue length and normalizing condition as constraints to derive queue-size distribution. Our results give good approximation as demonstrated by taking a numerical illustration. In particular case when square coefficient of variation of inter-arr...
متن کاملCross Entropy-Based High-Impedance Fault Detection Algorithm for Distribution Networks
The low fault current of high-impedance faults (HIFs) is one of the main challenges for the protection of distribution networks. The inability of conventional overcurrent relays in detecting these faults results in electric arc continuity that it causes the fire hazard and electric shock and poses a serious threat to human life and network equipment. This paper presents an HIF detection algori...
متن کامل