KeyX: selective key-oriented indexing in native XML-databases

نویسنده

  • Beda Christoph Hammerschmidt
چکیده

In the world of Relational Database Management Systems (RDBMS) indexes are used to accelerate specific queries. The selection of indexes is an important task in database-tuning which is performed by a database administrator or an index selection tool which suggests a set of suitable indexes. In this paper we transfer the concept of specific indexes to XML Database Management Systems (XDBMS) and present an implementation that uses occurring queries to optimize the performance of an XML database system by automatically creating suitable indexes . We introduce an index approach, called key oriented XML index, that uses specific XML element values and attribute values as keys referencing arbitrary nodes in the data. We transfer the wellknown Index Selection Problem (ISP) to XDBMS. Solving the ISP, a workload of database operations is analyzed and a set of specific indexes that minimizes the total execution time is suggested. Because the ISP is an NP complete problem, we apply heuristics to find a solution with reduced complexity. Experimental results of the prototypical implementation of the key oriented XML indexes on top of a native XDBMS demonstrate that our approach significantly improves the query execution time with only moderate additional storage requirements. Because the workload is analyzed periodically and suitable indexes are created or dropped automatically by solving the ISP, our approach guarantees high performance over the total life time of a database. Published: in [41] Title: A selective key-oriented XML Index for the Index Selection Problem in XDBMS Authors: B. C. Hammerschmidt, M. Kempa and V. Linnemann Abstract: In relational database management systems indexes are used to accelerate specific queries. The selection of indexes is an important task when tuning a database which is performed by a database administrator or an index propagation tool which suggests a set of suitable indexes. In this paper we introduce a new index approach, called key-oriented XML index (KeyX), that uses specific XML element or attribute values as keys referencing arbitrary nodes in the XML data. KeyX is selective to specific queries avoiding efforts spent for elements which are never queried. This concept reduces memory consumption and unproductive index updates. We transfer the Index Selection Problem (ISP) to XDBMS. Applying the ISP, a workload of database operations is analyzed and a set of selective indexes that minimizes the total execution time for the workload is suggested. Because the workload is analyzed periodically and suitable indexes are created or dropped automatically our implementation of KeyX guarantees high performance over the total life time of a database. Published: in [44] 10.4. LIST OF PUBLICATIONS 165 Title: Comparisons and Performance Measurements of XML Index Structures Authors: B. C. Hammerschmidt, M. Kempa and V. Linnemann Abstract: Indexes are used to accelerate queries in database management systems (DBMS). In relational DBMS indexes are broadly explored whereas indexes in XML DBMS are still an active field of research. A multitude of approaches with different characteristics were introduced in the past. Approaches that are not selective to specific queries require the whole XML data to be indexed and may lead to enormous space consumption and poor performance if changes to the XML data occur often. With KeyX we have introduced a selective and key-oriented approach for indexing only relevant parts of XML data in a database. This work provides qualitative comparisons and performance measurements of recent approaches in XML indexing. We motivate why key-oriented indexing that is derived from the relational world performs as well in the XML context. Published: in [42] Title: Autonomous Index Optimization in XML Databases Authors: B. C. Hammerschmidt, M. Kempa and V. Linnemann Abstract: Defining suitable indexes is a major task when optimizing a database. Usually, a human database administrator defines a set of indexes in the design phase of the database. This can be done manually or with the help of so called index wizard tools analyzing predefined database operations. Even having an optimal initial set of indexes when setting up a database, there is no guarantee that these indexes will suit future demands. Rather, it is realistic that the typical usage of the database will change after a while because new queries appear, for instance. In consequence, the existing indexes are suboptimal. The typical way to handle this problem is that a database administrator maintains the database permanently. In XML database management systems (XDBMS) this problem becomes even worse: Because XML queries cover both content and structure the number of possible queries and indexes is significantly higher. Additionally, for XML data without schema information, queries and indexes cannot be defined in advance, because the structure and the content of the data is not restricted. Both facts tend to result in higher maintenance costs for XML indexes compared to relational indexes. In this paper we show by performance measurements that an adaptive XDBMS that analyzes its workload periodically and creates/drops XML indexes automatically guarantees a high performance over the total life time of a database. Although we present our index system called KeyX the idea and the results are transferable to other XML indexing approaches. Published: in [45] 166 CHAPTER 10. APPENDIX Title: The Index Update Problem for XML Data in XDBMS Authors: B. C. Hammerschmidt, M. Kempa and V. Linnemann Abstract: Database Management Systems are a major component of almost every information system. In relational Database Management Systems (RDBMS) indexes are well known and essential for the performant execution of frequent queries. For XML Database Management Systems (XDBMS) no index standards are established yet; although they are required not less. An inevitable side effect of any index is that modifications of the indexed data have to be reflected by the index structure itself. This leads to two problems: first it has to be determined whether a modifying operation affects an index or not. Second, if an index is affected, the index has to be updated efficiently best without rebuilding the whole index. In recent years a lot of approaches were introduced for indexing XML data in an XDBMS. All approaches lack more or less in the field of updates. In this paper we give an algorithm that is based on finite automaton theory and determines whether an XPath based database operation affects an index that is defined universally upon keys, qualifiers and a return value of an XPath expression. In addition, we give algorithms how we update our KeyX indexes efficiently if they are affected by a modification. The Index Update Problem is relevant for all applications that use a secondary XML data representation (e.g. indexes, caches, XML replication/synchronization services) where updates must be identified and realized. Published: in [47] Title: XDLT: A Distance Learning Tool for consistent teaching of XML and related Technologies Authors: B. C. Hammerschmidt, P. Stursberg, J. Jungclaus and V. Linnemann Abstract: The eXtended Markup Language (XML) has become an important data format in the e-learning world during the past years. A multitude of e-learning systems take advantage of XML for various purposes: to represent knowledge or content, for information exchange between distributed applications or just for platform-independent storage of data. Although XML reflects a technical issue of data representation and application architecture in most cases, an emerging need for students and teachers to learn XML and XML related technologies can be observed. For instance, a person who describes entities of a given domain with an XML-based ontology needs domain-specific knowledge and a certain degree of XML skills to express the knowledge. Current approaches to learn XML such as tutorials and XML editors lack in the field of guidance, monitoring of the learning process and interoperability of different XML related technologies like XML data modeling (DTD), XML transformation and query as well as update languages (XPath, XUpdate). With this paper we introduce a web-based distance teaching and learning system teaching fundamentals of XML and major XML related technologies. In contrast to interactive tutorials that operate mostly with fixed XML examples and XML editors which offer no guidance for the learner, our approach enables a student to learn XML and related technologies based on custom data and exercises that can be defined and monitored by a teacher. Published: in [48] 10.4. LIST OF PUBLICATIONS 167 Title: On the Intersection of XPath Expressions Authors: B. C. Hammerschmidt, M. Kempa and V. Linnemann Abstract: XPath is a common language for selecting nodes in an XML document. XPath uses so called path expressions which describe a navigation path through semistructured data. In the last years some of the characteristics of XPath have been discussed. Examples include the containment of two XPath expressions p and p (p ⊆ p). To the best of our knowledge the intersection of two XPath expressions (p ∩ p) has not been treated yet. The intersection of p and p is the set that contains all XML nodes that are selected both by p and p. In the context of indexes in XML databases the emptiness of the intersection of p and p is a major issue when updating the index. In order to keep the index consistent to the indexed data, it has to be detected if an index that is defined upon p is affected by a modifying database operation with the path expression p. In this paper we introduce the intersection problem for XPath and give a motivation for its relevance. We present an efficient intersection algorithm for XPath expressions without the NOT operator that is based on finite automata. For expressions that contain the NOT operator the intersection problem becomes NP -complete leading to exponential computations in general. With an average case simulation we show that the NP -completeness is no significant limitation for most real-world database operations. Published: in [46] 168 CHAPTER 10. APPENDIX

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Selective Key-Oriented XML Index for the Index Selection Problem in XDBMS

In relational database management systems indexes are used to accelerate specific queries. The selection of indexes is an important task when tuning a database which is performed by a database administrator or an index propagation tool which suggests a set of suitable indexes. In this paper we introduce a new index approach, called keyoriented XML index (KeyX), that uses specific XML element or...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

Indexing Methods for XML Documents

There has been much research about XML storage and information retrieval. Traditionally XML documents are mapped onto relational databases. Then the data can be queried by relational queries. Nowadays many digital libraries store XML documents in a native XML database. Native means that the documents are stored and retrieved in their original format. An important task of such a database is inde...

متن کامل

The Geometric Approach for Indexing XML data

Nowadays, the topic of native XML databases becomes very hot. Native XML databases allow to store and efficiently query for XML data. In this paper we introduce the geometric framework for XML data storage and retrieval. Our approach exploits the properties of vector spaces for structural indexing of XML and efficient exact matching queries while the second model uses the properties of metric s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005