Benefit and Cost of Query Answering in PDMS

نویسندگان

  • Armin Roth
  • Felix Naumann
چکیده

Peer data management systems (PDMS) are a natural extension to integrated information systems. They consist of a dynamic set of autonomous peers, each of which can mediate between heterogenous schemas of other peers. A new data source joins a PDMS by defining a semantic mapping to one or more other peers, thus forming a network of peers. Queries submitted to a peer are answered with data residing at that peer and by data that is reached along paths of mappings through the network of peers. However, without optimization methods query reformulation in PDMS is very inefficient due to redundancy in mapping paths. We present a decentral strategy that guides peers in their decision along which further mappings the query should be sent. The strategy uses statistics of the peers own data and statistics of mappings to neighboring peers to predict whether it is worthwhile to send the query to that neighbor—or whether the query plan should be pruned at this point. These decisions are guided by a benefit and cost model, trading off the amount of data a neighbor will pass back, and the execution cost of that step. Thus, we allow a high scale-up of PDMS in the number of participating peers. 1 PDMS and Data Quality Integrating semantically relevant information is a pressing problem. In practice, it can be observed that a decentralized P2P fashion of data sharing is preferred over centralized data integration systems. Users desire to pose queries to their own schema, and let the queries be transferred via schema mappings to similar peers in the neighborhood. Such requirements are addressed by peer data management systems (PDMS) [1–3]. Peers serve both as data sources and as mediators and queries are translated and transferred using semantic relationships between peers, so-called mappings, as shown in Fig. 1. Example application areas include partnerships between companies for developing complex technical products, cooperations of scientific institutions, and ad hoc crisis management [2]. PDMS can also serve as a decentralized infrastructure for mediation between ontologies in the semantic web. Like any information system integrating data from autonomous sources, PDMS are vulnerable to poor data quality in the sources, poor mappings to the sources, and thus poor data

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Control in XML PDMS Query Answering

Peer data management system (PDMS) is a decentralized system, in which each peer is autonomous and has its own schema and database. With the help of pairwise schema mapping built between any two relevant peers, a query at one peer can be rewritten and broadcast to the whole PDMS. Then answers from multiple peers are returned to the querying peer. In our thesis, we exploit the access control iss...

متن کامل

Efficient query answering in peer data management systems

Peer data management systems (Pdms) consist of a highly dynamic set of autonomous, heterogeneous peers connected with schema mappings. Queries submitted at a peer are answered with data residing at that peer and by passing the queries to neighboring peers. Pdms are the most general architecture for distributed integrated information systems. With no need for central coordination, Pdms are highl...

متن کامل

System P: Completeness-driven Query Answering in Peer Data Management Systems

Peer data management systems (PDMS) are a highly dynamic, decentralized infrastructure for large-scale data integration. They consist of a dynamic set of autonomous peers inter-connected with a network of schema mappings. Queries submitted at a peer are answered with local data and by data that is reached along paths of mappings. Due to redundancies in the mapping network, query answering in PD...

متن کامل

Completeness-driven Query Answering in Peer Data Management Systems

Peer data management systems (Pdms) consist of a dynamic set of autonomous and heterogeneous peers connected with schema mappings. Queries submitted at a peer are answered with data residing at that peer and by recursively passing the query along the mappings to neighboring peers. Due to massive redundancy in mapping paths from the quering peer to any peer in the network, Pdms tend to be very i...

متن کامل

Efficient and Effective Query Answering in a PDMS with SUNRISE

Peer Data Management Systems (PDMSs) have been recently proposed as an evolution of Peer-To-Peer (P2P) systems toward a more semantics-based description of peers’ contents and relationships. In a PDMS scenario a key challenge is query routing, i.e. the capability of selecting small subsets of semantically relevant peers to forward a query to. In this paper we demonstrate SUNRISE (System for Uni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005