Semantic Relation Extraction. Resources, Tools and Strategies
نویسنده
چکیده
Relation extraction is a subtask of information extraction that aims at obtaining instances of semantic relations present in texts. This information can be arranged in machine-readable formats, useful for several applications that need structured semantic knowledge. The work presented in this paper explores different strategies to automate the extraction of semantic relations from texts in Portuguese, Galician and Spanish. Both machine learning (distant-supervised and supervised) and rule-based techniques are investigated, and the impact of the different levels of linguistic knowledge is analyzed for the various approaches. Regarding domains, the experiments are focused on the extraction of encyclopedic knowledge, by means of the development of biographical relations classifiers (in a closed domain) and the evaluation of an open information extraction tool. To implement the extraction systems, several natural language processing tools have been built for the three research languages: From sentence splitting and tokenization modules to partof-speech taggers, named entity recognizers and coreference resolution systems. Furthermore, several lexica and corpora have been compiled and enriched with different levels of linguistic annotation, which are useful for both training and testing probabilistic and symbolic models. As a result of the performed work, new resources and tools are available for automated processing of texts in Portuguese, Galician and Spanish.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملRelation Extraction for Semantic Intranet Annotations
We present an approach for ontology driven extraction of relations from texts aimed mainly to produce enriched semantic annotations for the Semantic Web. The approach exploits linguistic and empirical strategies, by means of a pipeline method involving processes such as a parser, part-of-speech tagger, named entity recognition system, and pattern-based classification, and resources including on...
متن کاملLanguage Resources and Annotation Tools for Cross-Sentence Relation Extraction
In this paper, we present a novel combination of two types of language resources dedicated to the detection of relevant relations (RE) such as events or facts across sentence boundaries. One of the two resources is the sar-graph, which aggregates for each target relation ten thousands of linguistic patterns of semantically associated relations that signal instances of the target relation (Uszko...
متن کاملA hybrid approach for relation extraction aimed to semantic annotations
We present an approach for relation extraction from texts aimed to enrich the semantic annotations produced by a semantic web portal. The approach exploits linguistic and empirical strategies, by means of a pipeline method involving processes such as a parser, part-of-speech tagger, named entity recognition system, pattern-based classification and word sense disambiguation models, and resources...
متن کامل