page segmentation

Page Segmentation Using the Description of the Background

Journal: :Computer Vision and Image Understanding 1998

Apostolos Antonacopoulos

There is an ever increasing number of publications which do not have the “traditional” layout where printed regions are rectangular. Text paragraphs and areas of graphic type may be of any shape, individually rotated and in any arrangement. Previous document analysis techniques are not well suited to such complex layouts. This paper introduces a new method for the segmentation of images of docu...

متن کامل

Structure detection and segmentation of documents using 2D stochastic context-free grammars

Journal: :Neurocomputing 2015

Francisco Alvaro Francisco Cruz Fernandez Joan-Andreu Sánchez Oriol Ramos Terrades José-Miguel Benedí

In this paper we define a bidimensional extension of Stochastic Context-Free Grammars for structure detection and segmentation of images of documents. Two sets of text classification features are used to perform an initial classification of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of his...

متن کامل

A Unified Algorithm for Identification of Various Tabular Structures from Document Images

Journal: :IJDLS 2011

Sekhar Mandal Amit Kumar Das Partha Bhowmick Bhabatosh Chanda

This paper presents a unified algorithm for segmentation and identification of various tabular structures from document page images. Such tabular structures include conventional tables and displayed mathzones, as well as Table of

متن کامل

Page Segmentation Using Script Identification Vectors: A First Look

1997

Judith Hochberg Michael Cannon Patrick Kelly James White

This paper explores the use of script identification vectors in the analysis of multilingual document images. A script identification vector is calculated for each connected component in a document. The vector expresses the closest distance between the component and templates developed for each of thirteen scripts, including Arabic, Chinese, Cyrillic, and Roman. We calculate the first three pri...

متن کامل

A Model for Web Page Usage Mining Based on Segmentation

Journal: :CoRR 2011

K. S. Kuppusamy G. Aghila

The web page usage mining plays a vital role in enriching the page’s content and structure based on the feedbacks received from the user’s interactions with the page. This paper proposes a model for micro-managing the tracking activities by fine-tuning the mining from the page level to the segment level. The proposed model enables the web-master to identify the segments which receives more focu...

متن کامل

A Quantitative Comparison of Semantic Web Page Segmentation Approaches

2015

Robert Kreuzer Jurriaan Hage A. J. Feelders

This paper explores the effectiveness of different semantic web page segmentation algorithms on modern websites. We compare three known algorithms each serving as an example of a particular approach to the problem, and one self-developed algorithm, WebTerrain, that combines two of the approaches. With our testing framework we have compared the performance of four algorithms for a large benchmar...

متن کامل

Markov Random Field Models to Extract The Layout of Complex Handwritten Documents

2006

Stéphane Nicolas Thierry Paquet Laurent Heutte

We consider in this paper the problem of complex handwritten page segmentation such as novelist drafts or authorial manuscripts. We propose to use stochastic and contextual models in order to cope with local spatial variability, and to take into account some prior knowledge about the global structure of the document image. The models we propose to use are Markov Random Field models. Using this ...

متن کامل

Performance Comparison of Six Algorithms for Page Segmentation

2006

Faisal Shafait Daniel Keysers Thomas M. Breuel

This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, and Voronoi-diagram-based. The evaluation is performed using a subset of the UW-III collection commonly used for evaluation, with a separate training set for parameter optimization. We compare the results using both default param...

متن کامل

Tri-level handwritten text segmentation techniques for Gujarati language

Journal: :Indian journal of science and technology 2021

Objectives: To improve the efficiency of tri-level segmentation tasks for handwritten Gujarati text. Methods: Using hybrid methods segmentation, we have used line, word and character from image. This study presents a paradigm that works with touching characters, slop line written on page, overlapping, etc. It evaluated dataset 500+ images created by us different writing sentences people. We Hor...

متن کامل

Using tree-grammars for training set expansion in page classification

2003

Stefano Baldi Simone Marinai Giovanni Soda

In this paper we describe a method for the expansion of training sets made by XY trees representing page layout. This approach is appropriate when dealing with page classification based on MXY tree page representations. The basic idea is the use of tree grammars to model the variations in the tree which are caused by segmentation algorithms. A set of general grammatical rules are defined and us...

متن کامل