Single-character type identification

نویسندگان

  • Yefeng Zheng
  • Changsong Liu
  • Xiaoqing Ding
چکیده

Different character recognition problems have their own specific characteristics. The state-of-art OCR technologies take different recognition approaches, which are most effective, to recognize different types of characters. How to identify character type automatically, then use specific recognition engines, has not brought enough attention among researchers. Most of the limited researches are based on the whole document image, a block of text or a text line. This paper addresses the problem of character type identification independent of its content, including handwritten/printed Chinese character identification, and printed Chinese/English character identification, based on only one character. Exploiting some effective features, such as run-lengths histogram features and stroke density histogram features, we have got very promising result. The identification correct rate is higher than 98% in our experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Identification using Classifier Ensembles

In this paper we describe the language identification system we developed for the Discriminating Similar Languages (DSL) 2015 shared task. We constructed a classifier ensemble composed of several Support Vector Machine (SVM) base classifiers, each trained on a single feature type. Our feature types include character 1–6 grams and word unigrams and bigrams. Using this system we were able to outp...

متن کامل

How DNA barcoding can be more effective in microalgae identification: a case of cryptic diversity revelation in Scenedesmus (Chlorophyceae)

Microalgae identification is extremely difficult. The efficiency of DNA barcoding in microalgae identification involves ideal gene markers and approaches employed, which however, is still under the way. Although Scenedesmus has obtained much research in producing lipids its identification is difficult. Here we present a comprehensive coalescent, distance and character-based DNA barcoding for 11...

متن کامل

Native Language Identification using Phonetic Algorithms

In this paper, we discuss the results of the IUCL system in the NLI Shared Task 2017. For our system, we explore a variety of phonetic algorithms to generate features for Native Language Identification. These features are contrasted with one of the most successful type of features in NLI, character n-grams. We find that although phonetic features do not perform as well as character n-grams alon...

متن کامل

English , Devnagari and Urdu Text Identification

In a multi-lingual multi-script country like India, a single text line of a document page may contain words of two or more scripts. For the Optical Character Recognition of such a document page it is necessary to identify different scripts from the document. In this paper, an automatic technique for word -wise identification of English, Devnagari and Urdu scripts from a single document is propo...

متن کامل

The Guia Do Pecador: Are There Hidden Ligatures?

An analysis of a scanned copy of the 1599 Japanese edition of the Guia Do Pecador has been carried out in an attempt to ascertain whether certain sequences of characters were printed using single pieces of type. The results of the study strongly indicate that for all the sequences examined, there is no clear evidence that single pieces of type were used, and in many cases it is clear that the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002