Fast Answering of XPath Query Workloads on Web Collections
نویسندگان
چکیده
Several web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?). Dealing with the (highly variable) structure of such web collections poses additional challenges. This paper introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, enabling the efficient evaluation of XPath workloads (supporting all the axes and language constructs in XPath). Experiments validate that DescribeX enables existing document-at-a-time XPath tools to scale up to multi-gigabyte XML collections.
منابع مشابه
DescribeX: A Framework for Exploring and Querying XML Web Collections
DescribeX: A Framework for Exploring and Querying XML Web Collections Flavio Rizzolo Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2008 The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for s...
متن کاملEarliest Query Answering for Deterministic Streaming Tree Automata and a Fragment of XPath
We study the concept of earliest query answering as neededfor streaming XML processing with optimal memory man-agement. We derive lower complexity bounds showing thatearliest query answering for Forward XPath is not feasible inpolynomial time combined complexity except if P=NP. Wethen distinguish a fragment of Forward XPath with negationthat enjoys P-time earliest query ...
متن کاملXPath for DL Ontologies
Applications of description logics (DLs) such as OWL 2 and ontology-based data access (OBDA) require understanding of how to pose database queries over DL knowledge bases. While there have been many studies regarding traditional relational query formalisms such as conjunctive queries and their extensions, little attention has been paid to graph database queries, despite the fact that graph data...
متن کاملEarly Nested Word Automata for XPath Query Answering on XML Streams
Algorithms for answering XPath queries on Xml streams have been studied intensively in the last decade. Nevertheless, there still exists no solution with high efficiency and large coverage. In this paper, we introduce early nested word automata in order to approximate earliest query answering algorithms for nested word automata in a highly efficient manner. We show that this approximation can b...
متن کاملXxl @ Inex 2003
Information retrieval on XML combines retrieval on content data (element and attribute values) with retrieval on structural data (element and attribute names). Standard query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. Such search condi...
متن کامل