Classical Art Semantics Information Extraction: CASIE Pilot Project
نویسندگان
چکیده
The paper discusses the application of Natural Language Processing (NLP) techniques in the context of semantic annotation of classical art text via rule-based Information Extraction (IE) techniques combined with ontological and domain vocabulary input. The CASIE (Classical Art Semantics Information Extraction) was a pilot collaborative project between the Hypermedia Research Unit (University of South Wales) and the Beazley Archive (Oxford University), which aims to automatically extract information about cultural objects from classical art scholarly texts and represent this information in terms of the ISO metadata standard for cultural heritage, the International Council of Museum’s CIDOC Conceptual Reference Model (CRM). In total 12 documents (fascicules – high quality catalogues) were processed, originating from the Corpus Vasorum Antiquorum (CVA) collection containing over 350 high quality catalogues of mostly ancient Greek painted pottery, illustrating more than 100,000 vases. The extracted information was expressed in interoperable RDF graphs consistent with the CLAROS project format. The role of CIDOC-CRM is central for enabling semantic interoperability across the range of datasets that contribute to CLAROS. The CASIE pilot enabled a complementary exploitation of terminological and ontological resources via rule-based information extraction techniques, delivering semantic annotation with respect to the CRM in the broader field of digital humanities
منابع مشابه
Entity Extraction via Ensemble Semantics
Combining information extraction systems yields significantly higher quality resources than each system in isolation. In this paper, we generalize such a mixing of sources and features in a framework called Ensemble Semantics. We show very large gains in entity extraction by combining state-of-the-art distributional and patternbased systems with a large set of features from a webcrawl, query lo...
متن کاملWikiPhiloSofia and PanAnthropon: Extraction and Visualization of Facts, Relations, and Networks for a Digital Humanities Knowledge Portal
Wikipedia, with its unique structural features and a vast amount of user-generated content, is being increasingly recognized as a valuable knowledge source for various applications. Nevertheless, the mode of information search and retrieval on Wikipedia remains that of conventional keyword-based search and retrieval. The objective of my (soon-to-be-proposed) thesis project, entitled PanAnthropo...
متن کاملReverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages
Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...
متن کاملBoemie: Bootstrapping Ontology Evolution with Multimedia Information Extraction
The BOEMIE project proposes a bootstrapping approach to knowledge acquisition, which uses multimedia ontologies for fused extraction of semantics from multiple modalities, and feeds back the extracted information, aiming to automate the ontology evolution process.
متن کاملDeliverable D 2 . 2 : Semantics
for dissemination) This deliverable presents the state-of-the-art on “Semantics Extraction from visual content”. We first give an overview of OCR methodologies dealing with visual content, namely image and video, that provide textual information. Although, this kind of analysis does not provide directly any semantics, it is a critical step involving visual content, which feeds BOEMIE’s module d...
متن کامل