XML Mixed Content Grammars

نویسندگان

  • Pim van der Eijk
  • Dennis Janssen
چکیده

Extensible Markup Language documents are composed of sequences of character data and embedded markup, under control of a Document Type Definition. We will show that embedded markup needs to be dealt with at a syntactic level to account for interdependencies between linguistic structure and document structure as found in style guides and Controlled Languages. Furthermore, we discuss the effect of XML content manipulation by NLP tools on XML document integrity, and outline conditions under which the NLP system preserves XML well-formedness and validity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Balanced Context-Free Grammars, Hedge Grammars and Pushdown Caterpillar Automata

The XML community generally takes trees and hedges as the model for XML document instances and element content. In contrast, Berstel and Boasson have discussed XML documents in the framework of extended context-free grammar, modeling XML documents as Dyck strings and schemas as balanced grammars. How can these two models be brought closer together? We examine the close relatioship between Dyck ...

متن کامل

Grammar Inference for Web Documents

Presentational XML documents, such as XHTML or Presentation MathML, use XML tags mainly for formating purposes, while descriptive XML applications, such as a wellstructured movie database, use tags to structure data items in a semantically meaningful way. There is little semantic connection between tags in a presentational XML document and its content, so the tagging is often complex and seemin...

متن کامل

Reasoning about Xml Schema Languages Using Formal Language Theory

A mathematical framework using formal language theory to describe and compare XML schema languages is presented. Our framework uses the work in two related areas { regular tree languages CDG + 97] and ambiguity in regular expressions BEGO71, BKW98]. Using these work as well as the content in two classical references HU79, AU79], we present the following results: (1) a normal form representation...

متن کامل

Complexity of Context − Free Grammars with Exceptions

This report has been submitted forr publication outside of ITC and will probably be copyrighted if accepted for publication. It has been issued as a Technical Report forr early dissemination of its contents. In view of the transfert of copy right too the outside publisher, its distribution outside of ITC priorr to publication should be limited to peer communications and specificc requests. Afte...

متن کامل

Complexity of Context - free Grammars with Exceptionsand

This report has been submitted forr publication outside of ITC and will probably be copyrighted if accepted for publication. It has been issued as a Technical Report forr early dissemination of its contents. In view of the transfert of copy right too the outside publisher, its distribution outside of ITC priorr to publication should be limited to peer communications and specificc requests. Afte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998