Tracking Sub-page Components within Document Workflows
نویسندگان
چکیده
Documents go through numerous transformations and intermediate formats as they are processed from abstract markup into final printable form. This notion of a document workflow is well established but it is common to find that ideas about document components, which might exist in the source code for the document, become completely lost within an amorphous, unstructured, page of PDF prior to being rendered. Given the importance of a component-based approach in Variable Data Printing (VDP) we have developed a collection of tools that allow information about the various transformations to be embedded at each stage in the workflow, together with a visualization tool that uses this embedded information to display the relationships between the various intermediate documents. In this paper, we demonstrate these tools in the context of an example document workflow but the techniques described are widely applicable and would be easily adaptable to other workflows and for use in teaching tools to illustrate document component and VDP concepts.
منابع مشابه
Persian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملUnderstanding Digital Documents Using Gestalt Properties of Isothetic Components
This paper introduces how Gestalt properties can be used for identifying various components in a document image. That the human mind makes a holistic approach to vision rather than a disintegrated approach is shown to be useful for document analysis. Since the major constituent components (textual or non-textual) in a document page are arranged in a rectilinear fashion, rectilinear/isothetic de...
متن کاملAnalysis and Ground - truth Elements ) Format Framework †
There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image charac...
متن کاملNiW: Converting Notebooks into Workflows to Capture Dataflow and Provenance
Interactive notebooks are increasingly popular among scientists to expose computational methods and share their results. However, it is often challenging to track their dataflow, and therefore the provenance of their results. This paper presents an approach to convert notebooks into scientific workflows that capture explicitly the dataflow across software components and facilitate tracking prov...
متن کاملAutomatic Workflow Generation and Modification by Enterprise Ontologies and Documents
This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...
متن کامل