تشخیص دست‌نوشتۀ‌ برخط فارسی با استفاده از مدل زبانی و کاهش قوانین نگارش کاربر

Authors

Abstract:

The Joint-up, cursive form of Persian words and immense variety of its scripts, also different figures of Persian letters depending on their sitting positions in the words, have turned the Persian handwritings recognition to an intense challenge. The major obstacle of the most often recognition ways, is their inattention to sentence contexture which causes utilizing of a word with correct appearance within an incorrect sentence, when an input word is misrecognized. Sketching a solution that provides suitable analysis of sentence contexture, requires huge linguistic resources to take place as a fine representative for the chosen language to be recognized. In this article, a new method for online recognition of Persian words is presented which tries to improve recognition process by using the term contexture. In this article, the vocabularies collection of Persian language is divided into two groups. The first category is the vocabulary with all of their sub-words being supported by the database of handwritten subclasses, while these vocabulary form 68.2% of the total vocabulary, and the assumptions being scored at the recognition stage, are members of these vocabularies. The second category is the vocabulary that is not supported by the database. Obviously, if the recognition system does not support this vocabulary, it cannot recognize more than 30 percentages of the language's words. At the recognition stage, the symptoms are detected and a symptom tag is produced. Also, at this stage, using the same label, the vocabulary is also selected as the sign with the input word. (These vocabularies are chosen from those were not supported at the recognition stage). Scoring for hypotheses was done by combining recognition scores and linguistic models. The certain fact in this section is that it is impossible to calculate recognition scores due to the absence of hypothetical subheadings. Therefore, the vocabulary score being recognized in the previous steps, is used. According to the studies, it was concluded that if the word is equivalent to a member's input from a supported vocabulary, even if the result of the recognition is incorrect, in most cases the correct term is in the first four hypotheses. Usually, scores of the first few hypotheses are close to each other, and the other assumptions are far from the correct hypothesis. Since the system operates online, unnecessary computations should be avoided. Therefore, if the number of hypotheses in the recognition section are more than four hypotheses, only the first four hypotheses are calculated for the language model. To calculate the recognition score for new hypotheses, if there are fewer than four hypotheses in the recognition section, the lowest hypothesis score and otherwise the hypothesis score are considered for the recognition score of the new hypotheses. Then, as with previous assumptions, for the new hypotheses, the linguistic score is calculated, and then the final score is obtained for each hypothesis. Finally, the assumption with the highest score is considered as the system output, and the rest of the assumptions are displayed in the output to the user. Experiments show that even in the event of a mistake, the correct word is often presented as a second hypothesis in most cases, and in some cases as a third hypothesis. Also, to reduce the limits and rules that gainers compel to submit. The method demonstrated in this article includes the symptoms and morphemes framework of input handwritten are segregated and the framework of each morpheme with its symptoms is specified at first, then the symptoms of morphemes are specified and based on them a collection of words is being considered as a hypothesis. Each hypothesis is given a score by measuring the similarity to input handwritten and according to taken scores, the likely hypotheses are indicated. Then, this procedure is led to achieve hypotheses more likely by lingual models. To totalize the scores of a hypothesis, for the differences in scale of taken scores, a method of score normalization is being offered. The results demonstrate that by utilizing of a language model with an online system of handwriting recognition, a significant reduction of words recognition error rate is being achieved. In addition to error rate reduction, by taking advantages of this language model, a technique is being offered that can handle the Persian vocabulary recognition entirely. By availing the offered manner, the recognition precision at initial stage of letters level up to 95.9% and so the language model recognition up to 99.3% improved. So, using huge linguistic resources for Persian language and utilizing a language model, can improve the accuracy of recognition. For further work, reinforcement learning algorithm is suggested to adapt the algorithm for users.  

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

بازشناسی حروف برخط فارسی با استفاده از ویژگی‌های ساختاری

در این مقاله گروه‌بندی و بازشناسی حروف تنهای فارسی که به صورت برخط نوشته شده باشند، بر اساس ویژگی‌های ساختاری آن‌ها ارائه شده است. حروف بر اساس شکل و ساختار نوشتاری بدنه اصلی آن‌ها به 9 گروه تقسیم می‌شوند. پس از استخراج ویژگی‌ها، گروه‌بندی با استفاده از درخت تصمیم انجام می‌شود. بازشناسی نهایی حروف با توجه به ساختار اجزای کوچک آن‌ها در هر گروه صورت می‌پذیرد. با توجه به این که در این مقاله از روش...

full text

بازشناسی حروف برخط فارسی با استفاده از ویژگی‌های ساختاری

در این مقاله گروه‌بندی و بازشناسی حروف تنهای فارسی که به صورت برخط نوشته شده باشند، بر اساس ویژگی‌های ساختاری آن‌ها ارائه شده است. حروف بر اساس شکل و ساختار نوشتاری بدنه اصلی آن‌ها به 9 گروه تقسیم می‌شوند. پس از استخراج ویژگی‌ها، گروه‌بندی با استفاده از درخت تصمیم انجام می‌شود. بازشناسی نهایی حروف با توجه به ساختار اجزای کوچک آن‌ها در هر گروه صورت می‌پذیرد. با توجه به این که در این مقاله از روش...

full text

بازشناسی متون فارسی با استفاده از مدل زبانی n-gram و پالایش گرامری

Abstract Text recognition has been one of the growing research topics in recent years. Many of these researches have focused on recognition of letters and sub-words as a basis for identifying larger text structures such as words, phrases and sentences. This thesis presents a new method in which the recognized sub-words are combined in order to provide meaningful words and sentences in Farsi tex...

full text

بازشناسی برخط حروف مجزای دست‌نویس فارسی بر اساس تشخیص گروه بدنه اصلی با استفاده از ماشین بردار پشتیبان

In this paper a new method for the online recognition of handwritten Persian characters has been proposed which uses a set of simple features and Support Vector Machine (SVM) as a classifier. The task of preprocessing allows us to equalize feature vectors from different characters. This algorithm is implemented in two steps. In the first step, input character is classified into one of eighteen ...

full text

بازشناسی برخط زیر-کلمات فارسی بر اساس ویژگی‌های کدهای زنجیره‌ای فریمن با استفاده از ‌ مدل مخفی مارکوف

در این مقاله سعی بر شناسایی برخط زیر-کلمات فارسی با استفاده از کدهای زنجیره‌ای فریمن و مدل مخفی مارکوف شده است. کدهای زنجیره‌ای با استفاده از جهت شکستگی‌ها، ضمن حفظ جهت حرکت قلم، حجم داده‌ها را کاهش می‌دهد. از این‌رو می‌تواند به عنوان یک روش مؤثر در شناسایی برخط زیر-کلمات بکار گرفته شود. پس از شکستن زیر-کلمه به بخش‌های تشکیل‌دهنده (بدنه اصلی و ریزحرکات)، با استفاده از کدهای زنجیره‌ای فریمن، هر ...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 14  issue 2

pages  3- 24

publication date 2017-09

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

No Keywords

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023