Data conversion, extraction and record linkage using XML and RDF tools in Project SIMILEc
نویسندگان
چکیده
SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), HP Labs and the World Wide Web Consortium (W3C). It is investigating the application of Semantic Web tools, such as the Resource Description Framework (RDF), to the problem of dealing with heterogeneous metadata. This report describes how XML and RDF tools are used to perform data conversion, extraction and record linkage on some sample datasets featuring visual images (ARTstor ) and learning objects (OpenCourseWare) in the first SIMILE proof of concept demo.
منابع مشابه
A Web Application using RDF/RDFS for Metadata Navigation
This paper describes using RDF/RDFS/XML to create and navigate a metadata model of relationships among entities in text. The metadata we create is roughly an order of magnitude smaller than the content being modeled, it provides the end-user with context sensitive information about the hyperlinked entities in focus. These entities at the core of the model are originally found and resolved using...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملThe PIA Project: Learning to Semantically Annotate Texts from an Ontology and XML-Instance Data
The development of the XML and RDF(S) standards offer a positive environment for machine learning to enable the automatic XML-annotation of texts that can encourage the extension of Semantic Web applications. After reviewing the current limitations of information extraction technology, specifically its lack of portability to new domains, we introduce the PIA project for automatically XML-annota...
متن کاملExploiting Semantic Web Technologies for Intelligent Access to Historical Documents
The FDR/Pearl Harbor Project involves the enhancement of materials drawn from the Franklin D. Roosevelt Library and Digital Archives, which includes a range of image, sound, video and textual data. The project is undertaking the encoding, annotation, and multi-modal linkage of a portion of the collection, and enhancement of a web-based interface that enables exploitation of state-of-theart meth...
متن کامل