Data Driven XPath Generation
نویسندگان
چکیده
The XPath query language offers a standard for information extraction from HTML documents. Therefore, the DOM tree representation is typically used, which models the hierarchical structure of the document. One of the key aspects of HTML is the separation of data and the structure that is used to represent it. A consequence thereof is that data extraction algorithms usually fail to identify data if the structure of a document is changed. In this paper, it is investigated how a set of tabular oriented XPath queries can be adapted in such a way it deals with modifications in the DOM tree of an HTML document. The basic idea is hereby that if data has already been extracted in the past, it could be used to reconstruct XPath queries that retrieve the data from a different DOM tree. Experimental results show the accuracy of our method.
منابع مشابه
When Grammars do not Suffice: Data and Content Integrity Constraints Verification in XML through a Conceptual Model
Complex applications can benefit greatly from using conceptual models and Model Driven Architecture during development, deployment and runtime. XML applications are not different. In this paper, we examine the possibility of using Object Constraint Language (OCL) for expressing constraints over a conceptual model for XML data. We go through the different classes of OCL expression and show how e...
متن کاملTesting XPath Queries using Model Checking
XML’s rapid adoption as the data representation standard in web based systems is increasing the interest in applying XML query languages (as XPath) to access XML repositories. This technology entails new challenges related to testing, mainly derived from the hierarchical data representation in XML documents and the expressiveness of the query language. In this paper, we present a technique for ...
متن کاملA Data Model for Temporal XML Documents
XML is expected to become the next generation standard language for exchanging data over the Internet. In general, the contents of XML documents may change as time goes by, and then, it is important to capture entire histories of those documents. In this paper, we propose a logical data model for representing histories of XML documents. The proposed model extends the XPath data model, and is ca...
متن کاملLoPiX: A System for XML Data Integration and Manipulation
LOPiX is an implementation of XPathLog [May01b], an XML/XPath-native, rule-based programming language for manipulation and integration of XML documents. The main syntactical constructs are XPath expressions, extended with variables. Due to the close relationship with XPath, the semantics of rules is easy to grasp. In contrast to other approaches, the XPath syntax and semantics is also used for ...
متن کاملVAMANA : A High Performance, Scalable and Cost Driven XPath Engine
Many applications are migrating or beginning to make use native XML data. We anticipate that queries will emerge that emphasize the structural semantics of XML query languages like XPath and XQuery. This brings a need for an efficient query engine and database management system tailored for XML data similar to traditional relational engines. While mapping large XML documents into relational dat...
متن کامل