Logical Schema Acquisition from Text-Based Sources for Structured and Non-Structured Biomedical Sources Integration
نویسندگان
چکیده
In this paper we present a novel approach to integrate non-structured and structured sources of biomedical information. We part from previous research on database integration conducted in the context of the EC funded INFOGENMED project. In this project we developed the ONTOFUSION system, which provides a robust framework to integrate large sets of structured biomedical sources. Methods and tools provided by ONTOFUSION cannot be used to integrate non-structured sources, since the latter usually lack a logical schema. In this article we introduce a novel method to extract logical schemas from text-based collections of biomedical information. Non-structured sources equipped with a logical schema can be regarded as regular structured sources, and thus can be bridged together using the methods and tools provided by ONTOFUSION. To test the validity of this approach, we carried out an experiment with a set of five cancer databases.
منابع مشابه
Combining data integration and information extraction
Abstract Improving the ability of computer systems to process text is a significant research challenge. Many applications are based on partially structured databases, where structured data conforming to a schema is combined with free text. Information is stored as text in these applications because the queries requiredImproving the ability of computer systems to process text is a significant re...
متن کاملAn ontology-based approach for resolving semantic schema conflicts in the extraction and integration of query-based information from heterogeneous web data sources
There are many external resources and heterogeneous data on the internet that an organization or user may need to improve the decision making process. It is therefore, very important and critical that this information are complete, precise and can be acquired on time. Most web sources provide data in semi-structured form on the internet. The combination of semi-structured data from different so...
متن کاملGraph-Based Weakly-Supervised Methods for Information Extraction & Integration
The variety and complexity of potentially-related data resources available for querying --webpages, databases, data warehouses --has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of resea...
متن کاملUnstructured information integration through data-driven similarity discovery
Information integration from multiple heterogeneous sources is one of the major challenges facing enterprises and service providers today, and one of the important problems in this domain is the integration of structured and unstructured (or text) data. In this paper we describe our work on a data-driven approach to integrating various sources of text data, without relying on the availability o...
متن کاملVirtualization of Heterogeneous Data Sources for Grid Information Systems
Grid Information Systems will use existing data from various distributed and heterogeneous data stores as well as new data entering the organization. Several technical obstacles arise in the design and implementation of a system for integration of such data sources – most notably distribution, autonomy, and data heterogeneity. This paper describes the data integration system based on the wrappe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- AMIA ... Annual Symposium proceedings. AMIA Symposium
دوره شماره
صفحات -
تاریخ انتشار 2007