Research of Chinese Handwritten Text Segmentation Algorithm
نویسنده
چکیده
OCR is a complicated process, there are many factors that can influence the recognition rate. Early period people tried to optimize the classifier to obtain high recognition rate, but the premise is that there is only one character no matter print or handwritten. For the performance of classifier has been promoted a lot, recognition rate for single character is high enough for commercial use. With the development of the demand for handwritten text recognition, how to raise the recognition rate of OCR system becomes very important. Unlike OCR system for print which focus on classifier. The research of OCR system for handwritten text is mainly on character segmentation. Statistical analysis showed that the mistake made by missegment is more than the mistake made by classifier. This is decided by the feature of handwritten text. There are more randomness and the lines are not horizontal, besides that, handwritten Chinese characters are more like overlapped and the gaps between characters are smaller. So this is the difficulty of handwritten Chinese characters. In this paper, the mutil-step searching nonlinear line exaction algorithm the paper proposed is easy and the accuracy is high, which can tackle the some weaknesses of direct projection method and indirect projection.
منابع مشابه
Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm
This paper presents a recognition-based character segmentation method for handwritten Chinese characters. Possible non-linear segmentation paths are initially located using a probabilistic Viterbi algorithm. Candidate segmentation paths are determined by verifying overlapping paths, between-character gaps, and adjacent-path distances. A segmentation graph is then constructed using candidate pat...
متن کاملThe Horizontal Segmentation of Lines in Chinese Handwritten Texts Based on the Intervals (Distances) in Fuzzy Triangles
The horizontal segmentation of handwritten text lines is a key step to detect handwritten texts has slant. In this paper, a novel method is proposed based on the fuzzy triangles to bring together and connecting the text lines. This proposed method has been tested on data banks in Chinese languages. In the experiments on the Chinese handwritten texts, a performance of 94.53% was obtained. Abbrev...
متن کاملHandwritten Text Line Segmentation by Clustering with Distance Metric Learning
Separating text lines in handwritten documents remains a challenge because the text lines are often ununiformly skewed and curved. In this paper, we propose a novel text line segmentation algorithm based on Minimal Spanning Tree (MST) clustering with distance metric learning. Given a distance metric, the connected components of document image are grouped into a tree structure. Text lines are ex...
متن کاملRecent Results of Online Japanese Handwriting Recognition and Its Applications
This paper discusses online handwriting recognition of Japanese characters, a mixture of ideographic characters (Kanji) of Chinese origin, and the phonetic characters made from them. Most Kanji character patterns are composed of multiple subpatterns, called radicals, which are shared among many (sometimes hundreds of) Kanji character patterns. This is common in Oriental languages of Chinese ori...
متن کاملTranscript mapping for handwritten Chinese documents by integrating character recognition model and geometric context
Creating document image datasets with ground-truths of regions, text lines and characters is a prerequisite for document analysis research. However, ground-truthing large datasets is not only laborious and time consuming but also prone to errors due to the difficulty of character segmentation and the large variability of character shape, size and position. This paper describes an effective reco...
متن کامل