Facilitating Integration of Distributed Statistical Databases Using Metadata and XML
نویسندگان
چکیده
In a distributed statistical database environment, data selection and actual statistical computation can be carried out at local databases, while data integration can bring together data produced from diverse sources, at different levels of details to produce statistical summaries. For example, the amount of beef retailed in European countries may be held in different local consumer databases and brought each country’s consumer index. Then the total amount of beef retailed across the countries may be compared and summarized. Such data integration often requires associated local (data provider) and global (domain) metadata and can be treated as a process of distributed database querying, which is accomplished by SQL native operators or the external statistical operators of combining SELECT and GROUPBY. In this paper, we present an integration approach that was developed in a fourth framework project, ADDSIA (Access to Distributed Databases for Statistical Information and Analysis), and which is built on a semi-structured data framework consisting of local and global metadata coded in XML [1, 2]. Extensions of this approach will be used in a fifth framework project, MISSION (Multi-Agent Integration of Shared Statistical Information Over the [inter]Net), in which the mapping between the local metadata and public metadata standards will be dynamically established through agent Abstract: This paper describes a novel approach for integrating distributed databases. It is based on a semi-structured data framework which uses XML (eXtensible Markup Language) and metadata derived from data productions and corresponding databases. This framework serves as an "integrated dynamic content table", enabling the user to browse the relevant information and database structure required prior to query composition; it provides a robust facility for sending a valid query to distributed databases without a prior knowledge of the component schema structure; a new class of statistical applications may therefore be easily built and managed. It remedies deficiencies in querying to distributed databases that heavily relies on prior understanding of the schema structure. An initial prototype has been developed.
منابع مشابه
A Metadata Integration Assistant Generator for Heterogeneous Distributed Databases
This paper describes a metadata interchange approach for semi-automated integration of heterogeneous distributed databases. Our system prototype uses distributed metadata to generate a GUI tool for a meta-user (who does the metadata integration) to describe mappings between master and local databases by assigning index numbers and specifying conversion function names; the system uses Quilt as i...
متن کاملEstablishing an XML metadata klnowledge base to assist integration of structured and semi-structured databases
This paper describes the establishment of an XML Metadata Knowledge Base (XMKB) to assist integration of distributed heterogeneous structured data residing in relational databases and semi-structured data held in wellformed XML documents (XML documents that conform to the XML syntax rules but have no referenced DTD or XML schema) produced by internet applications. We propose an approach to comb...
متن کاملFACILITATING INTERDISCIPLINARY SCIENCES BY THE INTEGRATION OF A CLOSi-BASED DATABASE WITH BIO-METADATA
Biodiversity information has been collected and compiled during many unrelated and independent projects across Amazon region. Institutions on their own maybe unable to answer crucial questions, as their answers may depend on a multi-disciplinary context. Since their situation is still considerably isolated, some solutions adopted impose redundancy leading to high costs. The use of computer tech...
متن کاملScalable Hybrid Search on Distributed Databases
We have previously described a hybrid keyword search that combines metadata search with a traditional keyword search over unstructured context data. This hybrid search paradigm provides the inquirer additional options to narrow the search with some semantic aspect from the XML metadata query. But in earlier work, we experienced the scalability limitations of a single-machine implementation. In ...
متن کاملMetadata Services for Distributed Event Stream Processing Agents
Enterprise-level applications are becoming complex with the need for event and stream processing, multiple query processing and data analysis over heterogeneous data sources such as relational databases and XML data. Such applications require access to the metadata information for these different data sources. This paper discusses the design and implementation of a servicebased dynamic metadata...
متن کامل