Learning Schemas for Unordered XML
نویسندگان
چکیده
We consider unordered XML, where the relative order among siblings is ignored, and we investigate the problem of learning schemas from examples given by the user. We focus on the schema formalisms proposed in [10]: disjunctive multiplicity schemas (DMS) and its restriction, disjunction-free multiplicity schemas (MS). A learning algorithm takes as input a set of XML documents which must satisfy the schema (i.e., positive examples) and a set of XML documents which must not satisfy the schema (i.e., negative examples), and returns a schema consistent with the examples. We investigate a learning framework inspired by Gold [18], where a learning algorithm should be sound i.e., always return a schema consistent with the examples given by the user, and complete i.e., able to produce every schema with a sufficiently rich set of examples. Additionally, the algorithm should be efficient i.e., polynomial in the size of the input. We prove that the DMS are learnable from positive examples only, but they are not learnable when we also allow negative examples. Moreover, we show that the MS are learnable in the presence of positive examples only, and also in the presence of both positive and negative examples. Furthermore, for the learnable cases, the proposed learning algorithms return minimal schemas consistent with the examples.
منابع مشابه
XML schemas without order
XML schemas consist of context-free grammars that allow regular expressions on the right-hand side of productions. In the schema definition language ScmDL, XML schemas are enhanced to, among other things, mark nodes as ordered or as unordered. An unordered node is then derived by a production with a regular expression r if the string induced by its children belongs to the symmetric closure of r...
متن کاملAn Experiment on the Matching and Reuse of XML Schemas
XML Schema is becoming an indispensable component in developing web applications. With its widespread adoption and its web accessibility, XML Schema reuse is becoming imperative. To support XML Schema reuse, the first step is to develop mechanism to search for relevant XML Schemas over the web. This paper describes a XML Schema matching system that compares two XML Schemas. Our matching system ...
متن کاملSimple Schemas for Unordered XML
We consider unordered XML, where the relative order among siblings is ignored, and propose two simple yet practical schema formalisms: disjunctive multiplicity schemas (DMS), and its restriction, disjunction-free multiplicity schemas (MS). We investigate their computational properties and characterize the complexity of the following static analysis problems: schema satisfiability, membership of...
متن کاملEfficient Subtyping for Unordered XML Types
While XML is an ordered data format, many applications outside the document processing area just drop ordering and manipulate XML data as they were unordered. In these contexts, hence, XML is essentially used as a way for representing unordered, unranked trees. The wide use of unordered XML data should be coupled with a careful and detailed analysis of their theoretical properties. One of the o...
متن کاملApproximate Common Structures in XML Schema Matching1
This paper describes a matching algorithm that can find accurate matches and scales to large XML Schemas with hundreds of nodes. We model XML Schemas as labeled, unordered and rooted trees, and turn the schema matching problem into a tree matching problem. We develop a tree matching algorithm based on the concept of Approximate Common Structures. Compared with the tree edit-distance algorithm a...
متن کامل