Removing Noisy Mentions for Distant Supervision Eliminando Menciones Ruidosas para la Supervisión a Distancia
نویسندگان
چکیده
Relation Extraction methods based on Distant Supervision rely on true tuples to retrieve noisy mentions, which are then used to train traditional supervised relation extraction methods. In this paper we analyze the sources of noise in the mentions, and explore simple methods to filter out noisy mentions. The results show that a combination of mention frequency cut-off, Pointwise Mutual Information and removal of mentions which are far from the feature centroids of relation labels is able to significantly improve the results of two relation extraction models.
منابع مشابه
Improving Distant Supervision for Information Extraction Using Label Propagation Through Lists
Because of polysemy, distant labeling for information extraction leads to noisy training data. We describe a procedure for reducing this noise by using label propagation on a graph in which the nodes are entity mentions, and mentions are coupled when they occur in coordinate list structures. We show that this labeling approach leads to good performance even when off-the-shelf classifiers are us...
متن کاملModeling Relations and Their Mentions without Labeled Text
Several recent works on relation extraction have been applying the distant supervision paradigm: instead of relying on annotated text to learn how to predict relations, they employ existing knowledge bases (KBs) as source of supervision. Crucially, these approaches are trained based on the assumption that each sentence which mentions the two related entities is an expression of the given relati...
متن کاملCoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases
Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. ...
متن کاملAFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding
Distant supervision has been widely used in current systems of fine-grained entity typing to automatically assign categories (entity types) to entity mentions. However, the types so obtained from knowledge bases are often incorrect for the entity mention’s local context. This paper proposes a novel embedding method to separately model “clean” and “noisy” mentions, and incorporates the given typ...
متن کاملMetodología DoRCU para la Ingeniería de Requerimientos
Resumen. DoRCU, Documentación de Requerimientos Centrada en el Usuario, es una metodología para la Ingeniería de Requerimientos caracterizada por su flexibilidad y orientación al usuario. Considera los mejores resultados de los enfoques examinados y se apoya en diversos métodos, técnicas y herramientas ya desarrollados por otros autores, pero sin comprometerse con los lineamientos de un paradig...
متن کامل