Geographic information extraction using natural language processing in Wikipedia texts
نویسندگان
چکیده
Geographic information extracted from texts is a valuable source of location data about documents, which can be used to improve information retrieval and document indexing. Linked Data and digital gazetteers provide a large amount of data that can support the recognition of places mentioned in text. Natural Language Processing techniques, which have evolved significantly over the last years, offer tools and resources to perform named entity recognition (NER), more specifically directed towards identifying place names and relationships between places and other entities. In this work, we demonstrate the use of NER from texts, as a way to detect relationships between places that can be used to enrich an ontological gazetteer. We use a collection of Wikipedia articles as a test dataset to demonstrate the validity of this idea. Results indicate that a significant volume of place/non-place and place-place relationships can be detected using the proposed techniques.
منابع مشابه
A Resource-Poor Approach for Linking Ontology Classes to Wikipedia Articles
The applicability of ontologies for natural language processing depends on the ability to link ontological concepts and relations to their realisations in texts. We present a general, resource-poor account to create such a linking automatically by extracting Wikipedia articles corresponding to ontology classes. We evaluate our approach in an experiment with the Music Ontology. We consider linki...
متن کاملAn Ontology-Based Approach for Key Phrase Extraction
Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP). In this work, we propose a novel method for key phrase extracting of Vietnamese text that exploits the Vietnamese Wikipedia as an ontology and exploits specif...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملBuilding a Corpus for Named Entity Recognition using Portuguese Wikipedia and DBpedia
Some natural language processing tasks can be learned from example corpora, but having enough examples for the task at hands can be a bottleneck. In this work we address how Wikipedia and DBpedia, two freely available language resources, can be used to support Named Entity Recognition, a fundamental task in Information Extraction and a necessary step of other tasks such as Co-reference Resoluti...
متن کاملInformation Extraction from Wikipedia Using Pattern Learning
In this paper we present solutions for the crucial task of extracting structured information from massive free-text resources, such as Wikipedia, for the sake of semantic databases serving upcoming Semantic Web technologies. We demonstrate both a verb frame-based approach using deep natural language processing techniques with extraction patterns developed by human knowledge experts and machine ...
متن کامل