Offline Pashto Characters Dataset for OCR Systems
نویسندگان
چکیده
In computer vision and artificial intelligence, text recognition analysis based on images play a key role in the retrieving process. Enabling machine learning technique to recognize handwritten characters of specific language requires standard dataset. Acceptable character datasets are available many languages including English, Arabic, more. However, lack for Pashto hinders application suitable algorithm recognizing useful insights. order address this issue, study presents first image dataset (HPCID) scientific research work. This consists fourteen thousand, seven hundred, eighty-four samples—336 samples each 44 Such collected an A4-sized paper from different students Department University Peshawar, Khyber Pakhtunkhwa, Pakistan. On total, 336 faculty members contributed developing proposed database accumulation phase. contains multisize, multifont, multistyle varying structures.
منابع مشابه
UCOM offline dataset-an urdu handwritten dataset generation
A benchmark database for character recognition is an essential part for efficient and robust development. Unfortunately, there is no comprehensive handwritten dataset for Urdu language that would be used to compare the state of the art techniques in the field of optical character recognition. In this paper, we present a new and publically available dataset comprising 600 pages of handwritten Ur...
متن کاملProbabilistic Retrieval Methods for Text with Miss-Recognized OCR Characters
This paper presents two probabilistic text retrieval methods speci cally designed to carry out a full-text search of Japanese documents containing OCR errors. By searching for any query term under the premise that errors exist in recognized text, the presented methods can tolerate such errors, and therefore manual post-editing is not required after OCR recognition. In the applied approach, conf...
متن کاملRetrieval methods for English-text with missrecognized OCR characters
This paper presents three probabilistic text retrieval methods designed to carry out a full-text search of English documents containing OCR errors. By searching for any query term on the premise that there are errors in the recognized text, the methods presented can tolerate such errors, and therefore costly manual postediting is not required after OCR recognition. In the applied approach, conf...
متن کاملHybrid Off-line Ocr for Isolated Handwritten Greek Characters
In this paper, we present an off-line OCR methodology for isolated handwritten Greek characters mainly based on a robust hybrid feature extraction scheme. First, image pre-processing is performed in order to normalize the character images as well as to correct character slant. At the next step, two types of features are combined in a hybrid fashion. The first one divides the character image int...
متن کاملOCR Error Rate Versus Rejection Rate for Isolated Handprint Characters
A 1500-CHARACTER OR LESS FACTUAL SUMMARY OF MOST SIGNIFICANT INFORMATION. IF DOCUMENT INCLUDES A SIGNIFICANT BIBUOGRAPHY OR UTERATURE SURVEY. CITE IT HERE. SPELL OUT ACRONYMS ON FIRST REFERENCE.) (CONTINUE ON SEPARATE PAGE, IF NECESSARY.) ~; f}'I.a/yL+'-<-~1.Jl'tA (n~W f...< oQ.....~1 <~<-eCi.Lt-~) Over twenty-five~icipating in the First Census OCR SystemsConference submitted confidence data ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Security and Communication Networks
سال: 2021
ISSN: ['1939-0122', '1939-0114']
DOI: https://doi.org/10.1155/2021/3543816