Simultaneous detection of vertical and horizontal text lines based on perceptual organisation

نویسندگان

  • Claudie Faure
  • Nicole Vincent
چکیده

A page of a document is a set of small components which are grouped by a human reader into higher level components, such as lines and text blocs. Document image analysis is aimed at detecting these components in document images. We propose the encoding of local information by considering the properties that determine perceptual grouping. Each connected component is labelled according to the location of its nearest neighbour connected component. These labelled components constitute the input of a rule-based incremental process. Vertical and horizontal text lines are detected without prior assumption on their direction. Touching characters belonging to different lines are detected early and discarded from the grouping process to avoid line merging. The tolerance for grouping components increases in the course of the process until the final decision. After each step of the grouping process, conflict resolution rules are activated. This work was motivated by the automatic detection of Figure&Caption pairs in the documents of the historical collection of the BIUM digital library (Bibliothèque InterUniversitaire Médicale). The images that were used in this study belong to this collection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of power oscillation and simultaneous faults using Clark transform

Distance relays are widely used to protect transmission lines. Sometimes, in these lines due to the occurrence of the oscillation of the power, the impedance calculated in the distance relay enters into its functional zones and leads to the cutting off of the lines. This issue can cause global power outages. Accordingly, in this paper, a Clark-based method for detecting the oscillation of power...

متن کامل

Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)

Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...

متن کامل

Skew detection and text line position determination in digitized documents

-This paper proposes a computationally efficient procedure for skew detection and text line position determination in digitized documents, which is based on the cross-correlation between the pixels of vertical lines in a document. The determination of the skew angle in documents is essential in optical character recognition systems. Due to the text skew, each horizontal text line intersects a p...

متن کامل

Text detection in video frames

In this paper we present the state of the art for detecting text in images and video frames and propose an edge-based algorithm for artificial text detection in video frames. First, an edge map is created using the Canny edge detector. Then, morphological filtering is used, based on geometrical constraints, in order to connect the vertical edges and discard false alarms. A connected component a...

متن کامل

Detecting Text in Video Frames

In this paper we propose an edge-based algorithm for artificial text detection in video frames. First, an edge map is created using the Canny edge detector. Then, morphological filtering is used, based on geometrical constraints, in order to connect the vertical edges and discard false alarms. A connected component analysis is performed to the filtered edge map in order to determine a bounding ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009