Extracting Knowledge for Cultural Heritage Knowledge Base Population
نویسنده
چکیده
The entity-oriented description of the world is a major, current trend motivated by the need for semantic services that can support the human need of finding information, learning and discovering new knowledge, and broadening the existing knowledge horizons. Entities, managed in semantic knowledge bases, have the potential to be the backbone for these new and innovative services. Therefore, automatically extracting facts from various data sources and populating knowledge bases a challenge studied in this work. This thesis proposes methods for knowledge extraction for the cultural heritage domain. Extracting knowledge from the cultural heritage metadata is by no means a trivial task and there are often problems with missing or ambiguous information. Therefore, an inherent part of this work is dedicated to developing pattern-based techniques to extract knowledge from natural language documents to complement and supplement the knowledge we extract from metadata. However, the proposed framework is not limited to only work in conjunction with metadata extraction – it additionally supports independent, continuous mode operation, i.e. patterns learned during extraction are used to subsequently mine new knowledge. In summary, the main contributions of this thesis are: • FRBR-ML: a generic framework for exploiting metadata which includes: (i) a method to extract entities, attributes and relationships from existing legacy metadata, (ii) novel techniques for correction, enhancement and semantic enrichment of the metadata, and (iii) metrics to assess the quality of extraction. • SPIDER: a prototype that supports extraction of relational facts at Web-scale. Contrary to most knowledge extraction approaches, we tackle the problem of uniquely identifying entities both to extend their list of spelling forms and to facilitate the matching to LOD entities. Furthermore, in addition to the flexible pattern definition scheme, SPIDER enables a provenance-aware extraction method which prudently refines extracted facts by considering the PageRank and SpamScore as well as the relevance score of the source document. • KIEV: a prototype that takes the development of SPIDER into the next stage, namely by enabling verification of facts using two evidence-based techniques:
منابع مشابه
Integrating Wiki Systems, Natural Language Processing, and Semantic Technologies for Cultural Heritage Data Management
Modern documents can easily be structured and augmented to have the characteristics of a semantic knowledge base. Many older documents may also hold a trove of knowledge that would deserve to be organized as such a knowledge base. In this chapter, we show that modern semantic technologies offer the means to make these heritage documents accessible by transforming them into a semantic knowledge ...
متن کاملAn Ontology-Based Method for Extracting and Classifying Domain-Specific Compositional Nominal Compounds
In this paper, we present our preliminary study on an ontology-based method to extract and classify compositional nominal compounds in specific domains of knowledge. This method is based on the assumption that, applying a conceptual model to represent knowledge domain, it is possible to improve the extraction and classification of lexicon occurrences for that domain in a semi-automatic way. We ...
متن کاملA Novel Vision for Navigation and Enrichment in Cultural Heritage Collections
In the cultural heritage domain, there is a huge interest in utilizing semantic web technology and build services enabling users to query, explore and access the vast body of cultural heritage information that has been created over decades by memory institutions. For successful conversion of existing data into semantic web data, however, there is often a need to enhance and enrich the legacy da...
متن کاملDeliverable D 4 . 3 Demonstrator for final web services
SMARTMUSEUM (Cultural Heritage Knowledge Exchange Platform) is a Research and Development project sponsored under the Europeans Commission’s 7th Framework. The overall objective of the project is to develop a platform for innovative services enhancing on-site personalized access to digital cultural heritage through adaptive and privacy preserving user profiling. Using on-site knowledge database...
متن کاملCultural learning across the Smart City
Public involvement in defining and interpreting cultural heritage offers many benefits, including improved learning opportunities for individuals and a broader base of knowledge about art and heritage. This knowledge can in turn be used for better, smarter, information provision in the future. This paper proposes how to capture, analyse and present cultural information from different viewpoints...
متن کامل