XML Mixed Content Grammars
نویسندگان
چکیده
Extensible Markup Language documents are composed of sequences of character data and embedded markup, under control of a Document Type Definition. We will show that embedded markup needs to be dealt with at a syntactic level to account for interdependencies between linguistic structure and document structure as found in style guides and Controlled Languages. Furthermore, we discuss the effect of XML content manipulation by NLP tools on XML document integrity, and outline conditions under which the NLP system preserves XML well-formedness and validity.
منابع مشابه
Balanced Context-Free Grammars, Hedge Grammars and Pushdown Caterpillar Automata
The XML community generally takes trees and hedges as the model for XML document instances and element content. In contrast, Berstel and Boasson have discussed XML documents in the framework of extended context-free grammar, modeling XML documents as Dyck strings and schemas as balanced grammars. How can these two models be brought closer together? We examine the close relatioship between Dyck ...
متن کاملGrammar Inference for Web Documents
Presentational XML documents, such as XHTML or Presentation MathML, use XML tags mainly for formating purposes, while descriptive XML applications, such as a wellstructured movie database, use tags to structure data items in a semantically meaningful way. There is little semantic connection between tags in a presentational XML document and its content, so the tagging is often complex and seemin...
متن کاملReasoning about Xml Schema Languages Using Formal Language Theory
A mathematical framework using formal language theory to describe and compare XML schema languages is presented. Our framework uses the work in two related areas { regular tree languages CDG + 97] and ambiguity in regular expressions BEGO71, BKW98]. Using these work as well as the content in two classical references HU79, AU79], we present the following results: (1) a normal form representation...
متن کاملComplexity of Context − Free Grammars with Exceptions
This report has been submitted forr publication outside of ITC and will probably be copyrighted if accepted for publication. It has been issued as a Technical Report forr early dissemination of its contents. In view of the transfert of copy right too the outside publisher, its distribution outside of ITC priorr to publication should be limited to peer communications and specificc requests. Afte...
متن کاملComplexity of Context - free Grammars with Exceptionsand
This report has been submitted forr publication outside of ITC and will probably be copyrighted if accepted for publication. It has been issued as a Technical Report forr early dissemination of its contents. In view of the transfert of copy right too the outside publisher, its distribution outside of ITC priorr to publication should be limited to peer communications and specificc requests. Afte...
متن کامل