Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms

نویسندگان

  • Song Mao
  • Tapas Kanungo
چکیده

ÐWhile numerous page segmentation algorithms have been proposed in the literature, there is lack of comparative evaluationÐempirical or theoreticalÐof these algorithms. In the existing performance evaluation methods, two crucial components are usually missing: 1) automatic training of algorithms with free parameters and 2) statistical and error analysis of experimental results. In this paper, we use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First, we create mutually exclusive training and test data sets with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to search automatically for the optimal parameter values of the segmentation algorithms on the training data set, 4) the segmentation algorithms are then evaluated on the test data set, and, finally, 5) a statistical and error analysis is performed to give the statistical significance of the experimental results. In particular, instead of the ad hoc and manual approach typically used in the literature for training algorithms, we pose the automatic training of algorithms as an optimization problem and use the Simplex algorithm to search for the optimal parameter value. A paired-model statistical analysis and an error analysis are then conducted to provide confidence intervals for the experimental results of the algorithms. This methodology is applied to the evaluation of five page segmentation algorithms of which, three are representative research algorithms and the other two are well-known commercial products, on 978 images from the University of Washington III data set. It is found that the performance indices (average textline accuracy) of the Voronoi, Docstrum, and Caere segmentation algorithms are not significantly different from each other, but they are significantly better than that of ScanSoft's segmentation algorithm, which, in turn, is significantly better than

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Empirical performance evaluation of page segmentation algorithms

Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) system. While numerous segmentation algorithms have been proposed, there is relatively less literature on comparative evaluation | empirical or theoretical | of these algorithms. We use the following ve step methodology to quantitatively compare the performance of page segmentation algorithms: 1) F...

متن کامل

A Methodology for Empirical Performance Evaluationof Page Segmentation AlgorithmsSong

Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) systems. While numerous page segmentation algorithms have been proposed , there is relatively less literature on comparative evaluation | empirical or theoretical | of these algorithms. For the existing performance evaluation methods, two crucial components are usually missing: 1) automatic trainin...

متن کامل

Integrating AHP and data mining for effective retailer segmentation based on retailer lifetime value

Data mining techniques have been used widely in the area of customer relationship management (CRM). In this study, we have applied data mining techniques to address a problem in business-to-business (B2B) setting. In a manufacturer-retailer-consumer chain, a manufacturer should improve its relationship with retailers to continue its business. Segmentation is a useful tool for identifying groups...

متن کامل

Software Architecture of Pset: a Page Segmentation Evaluation Toolkit Software Architecture of Pset: a Page Segmentation Evaluation Toolkit

Empirical performance evaluation of page segmentation algorithms has become increasingly important due to the numerous algorithms that are being proposed each year. In order to choose between these algorithms for a speciic domain it is important to empirically evaluate their performance. To accomplish this task the document image analysis community needs i) standardized document image datasets ...

متن کامل

A survey on evaluation methods for image segmentation

This paper studies different methods proposed so far for segmentation evaluation. Most methods can be classified into three groups: the analytical, the empirical goodness and the empirical discrepancy groups. Each group has its own characteristics. After a brief description of each method in every group, some comparative discussions about different method groups are first carried out. An experi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2001