Toward a view-oriented approach for aligning RDF-based biomedical repositories.

نویسندگان

  • A Anguita
  • M García-Remesal
  • D de la Iglesia
  • N Graf
  • V Maojo
چکیده

INTRODUCTION This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". BACKGROUND The need for complementary access to multiple RDF databases has fostered new lines of research, but also entailed new challenges due to data representation disparities. While several approaches for RDF-based database integration have been proposed, those focused on schema alignment have become the most widely adopted. All state-of-the-art solutions for aligning RDF-based sources resort to a simple technique inherited from legacy relational database integration methods. This technique - known as element-to-element (e2e) mappings - is based on establishing 1:1 mappings between single primitive elements - e.g. concepts, attributes, relationships, etc. - belonging to the source and target schemas. However, due to the intrinsic nature of RDF - a representation language based on defining tuples < subject, predicate, object > -, one may find RDF elements whose semantics vary dramatically when combined into a view involving other RDF elements - i.e. they depend on their context. The latter cannot be adequately represented in the target schema by resorting to the traditional e2e approach. These approaches fail to properly address this issue without explicitly modifying the target ontology, thus lacking the required expressiveness for properly reflecting the intended semantics in the alignment information. OBJECTIVES To enhance existing RDF schema alignment techniques by providing a mechanism to properly represent elements with context-dependent semantics, thus enabling users to perform more expressive alignments, including scenarios that cannot be adequately addressed by the existing approaches. METHODS Instead of establishing 1:1 correspondences between single primitive elements of the schemas, we propose adopting a view-based approach. The latter is targeted at establishing mapping relationships between RDF subgraphs - that can be regarded as the equivalent of views in traditional databases -, rather than between single schema elements. This approach enables users to represent scenarios defined by context-dependent RDF elements that cannot be properly represented when adopting the currently existing approaches. RESULTS We developed a software tool implementing our view-based strategy. Our tool is currently being used in the context of the European Commission funded p-medicine project, targeted at creating a technological framework to integrate clinical and genomic data to facilitate the development of personalized drugs and therapies for cancer, based on the genetic profile of the patient. We used our tool to integrate different RDF-based databases - including different repositories of clinical trials and DICOM images - using the Health Data Ontology Trunk (HDOT) ontology as the target schema. CONCLUSIONS The importance of database integration methods and tools in the context of biomedical research has been widely recognized. Modern research in this area - e.g. identification of disease biomarkers, or design of personalized therapies - heavily relies on the availability of a technical framework to enable researchers to uniformly access disparate repositories. We present a method and a tool that implement a novel alignment method specifically designed to support and enhance the integration of RDF-based data sources at schema (metadata) level. This approach provides an increased level of expressiveness compared to other existing solutions, and allows solving heterogeneity scenarios that cannot be properly represented using other state-of-the-art techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating XQuery-enabled SCORM XML Metadata Repositories into an RDF-based E-Learning P2P Network

Edutella is an RDF-based E-Learning P2P network that is aimed to accommodate heterogeneous learning resource metadata repositories in a P2P manner and further facilitate the exchange of metadata between these repositories based on RDF. Whereas Edutella provides RDF metadata repositories with a quite natural integration approach, XML metadata repositories have to overcome considerable incompatib...

متن کامل

A new approach for storing RDF triples based on ontology modularization

Managing RDF data in an effective manner is one of significant factors in realizing semantic web vision. Currently, there are two general approaches for storing RDF data which are column-oriented and row-oriented and each one has advantages and disadvantages. These two approaches are used in relational DBMSs. After reviewing different approaches and methods for storing RDF triples, we put forwa...

متن کامل

NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases

RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL que...

متن کامل

SPARQL queries to RDFS views of Topic Maps

Both Topic Maps and RDF are popular semantic web standards designed for machine processing of web documents. Since these representations were originally created for different purposes, they have differences both in their concepts and in their data models. The Topic Map data model can be seen as an ontology for building indices over web documents, whereas RDF is a language to define arbitrary pr...

متن کامل

Massive-Scale RDF Processing Using Compressed Bitmap Indexes

The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scientific data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-finding queries on this implicit multigraph in a SQLlike syntax. SPARQL quer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Methods of information in medicine

دوره 54 1  شماره 

صفحات  -

تاریخ انتشار 2015