NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
نویسندگان
چکیده
We have often heard that data is the new oil. In particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. Several attempts have been proposed to extract knowledge from textual documents, extracting named entities, classifying them according to pre-defined taxonomies and disambiguating them through URIs identifying real world entities. As a step towards interconnecting the Web of documents via those entities, different extractors have been proposed. Although they share the same main purpose (extracting named entity), they differ from numerous aspects such as their underlying dictionary or ability to disambiguate entities. We have developed NERD, an API and a front-end user interface powered by an ontology to unify various named entity extractors. The unified result output is serialized in RDF according to the NIF specification and published back on the Linked Data cloud. We evaluated NERD with a dataset composed of five TED talk transcripts, a dataset composed of 1000 New York Times articles and a dataset composed of the 217 abstracts of the papers published at WWW 2011.
منابع مشابه
NIF: An ontology-based and linked-data-aware NLP Interchange Format
We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated this integration is not...
متن کاملIntegrating NLP Using Linked Data
We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is no...
متن کاملLinked-Data Aware URI Schemes for Referencing Text Fragments
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The motivation behind NIF is to allow NLP tools to exchange annotations about text documents in RDF. Hence, the main prerequisite is that parts of the documents (i.e. strings) are referenceable by URIs, so that the...
متن کاملCorpus Conversion , Parsing and Processing Using the Nlp Interchange Format 2 . 0
This work presents a thorough examination and expansion of the NIF ecosystem. Both core use cases of NIF, as a format for NLP tool integration, as well as a corpus pivot format, are addressed. NLP tool integration is extended with a new wrapper for the OpenNLP framework including a NIF parser, as well as a CoNLL converter to convert the widely used CoNLL format to NIF. Tools are examined for sc...
متن کاملTowardsWeb-Scale Collaborative Knowledge Extraction
While the Web of Data, the Web of Documents and Natural Language Processing are well researched individual fields, approaches to combine all three are fragmented and not yet well aligned. This chapter analyzes current efforts in collaborative knowledge extraction to uncover connection points between the three fields. The special focus is on three prominent RDF data sets (DBpedia, LinkedGeoData ...
متن کامل