Knowledge-based derivation of document logical structure
نویسندگان
چکیده
The analysis of a document image to derive a symbolic description of its structure and contents involves using spatial domain knowledge to classify the different printed blocks (e.g., text paragraphs), group them into logical units (e.g., newspaper stories), and determine the reading order of the text blocks within each unit. These steps describe the conversion of the physical structure of a document into its logical structure. We have developed a computational model for document logical structure derivation, in which a rule-based control strategy utilizes the data obtained from analyzing a digitized document image, and makes inferences using a multi-level knowledge base of document layout rules. The knowledge-based document logical structure derivation system (DeLoS) based on this model consists of a hierarchical rule-based control system to guide the block classification, grouping and read-ordering operations; a global data structure to store the document image data and incremental inferences; and a domain knowledge base to encode the rules governing document layout.
منابع مشابه
Using domain knowledge to derive the logical structure of documents
An important aspect of document understanding is document logical structure derivation, which involves knowledge-based analysis of document images to derive a symbolic description of their structure and contents. Domain-speciic as well as generic knowledge about document layout is used in order to classify, logically group, and determine the read-order of the individual blocks in the image, i.e...
متن کاملThe use of document structure analysis to retrieve information from documents in digital libraries
This paper describes an approach to retrieving information from document images stored in a digital library by means of knowledge-based layout analysis and logical structure derivation techniques. Queries on document image content are categorized in terms of the type of information that is desired (e.g., articles on a given topic), and are parsed to determine the type of document from which inf...
متن کاملUse of document structure analysis to retrieve information from documents in digital libraries
This paper describes an approach to retrieving information from document images stored in a digital library by means of knowledge-based layout analysis and logical structure derivation techniques. Queries on document image content are categorized in terms of the type of information that is desired (e.g., articles on a given topic), and are parsed to determine the type of document from which inf...
متن کاملExtending the Qualitative Trajectory Calculus Based on the Concept of Accessibility of Moving Objects in the Paths
Qualitative spatial representation and reasoning are among the important capabilities in intelligent geospatial information system development. Although a large contribution to the study of moving objects has been attributed to the quantitative use and analysis of data, such calculations are ineffective when there is little inaccurate data on position and geometry or when explicitly explaining ...
متن کاملA Knowledge-based Product Derivation Process and some Ideas how to Integrate Product Development
In this position paper, a product derivation process is described, which is based on specifications of known customer requirements, features, artifacts in a knowledge base. In such a knowledge base a model about all kinds of variability of a combined software/hardware systems are represented by using a logical-based representation language. Having such a language, a machinery which interprets t...
متن کامل