GWU-HASP: Hybrid Arabic Spelling and Punctuation Corrector
نویسندگان
چکیده
In this paper, we describe our Hybrid Arabic Spelling and Punctuation Corrector (HASP). HASP was one of the systems participating in the QALB-2014 Shared Task on Arabic Error Correction. The system uses a CRF (Conditional Random Fields) classifier for correcting punctuation errors, an open-source dictionary (or word list) for detecting errors and generating and filtering candidates, an n-gram language model for selecting the best candidates, and a set of deterministic rules for text normalization (such as removing diacritics and kashida and converting Hindi numbers into Arabic numerals). We also experiment with word alignment for spelling correction at the character level and report some preliminary results.
منابع مشابه
GWU-HASP-2015$@$QALB-2015 Shared Task: Priming Spelling Candidates with Probability
In this paper, we describe our system HASP-2015 (Hybrid Arabic Spelling and Punctuation Corrector) in which we introduce significant improvements over our previous version HASP-2014 and with which we participated in the QALB2015 Second Shared Task on Arabic Error Correction. Our system utilizes probabilistic information on errors and their possible corrections in the training data and combine t...
متن کاملQCMUQ$@$QALB-2015 Shared Task: Combining Character level MT and Error-tolerant Finite-State Recognition for Arabic Spelling Correction
We describe the CMU-Q and QCRI’s joint efforts in building a spelling correction system for Arabic in the QALB 2015 Shared Task. Our system is based on a hybrid pipeline that combines rule-based linguistic techniques with statistical methods using language modeling and machine translation, as well as an error-tolerant finite-state automata method. We trained and tested our spelling corrector us...
متن کاملAutomatic Arabic Spelling Errors Detection and Correction Based on Confusion Matrix- Noisy Channel Hybrid System
Arabic spelling errors occur in different types of documents, such as handwritten by non experienced users, optical character recognition (OCR) documents and machine translated documents. Many researchers had tried to solve this dilemma but till now there is no a radical solution. This paper proposes a hybrid system based on the confusion matrix and the noisy channel spelling correction model t...
متن کاملFlexible and Hybrid Action Selection Process for the Control of Highly Dynamic Multi-Robot Systems
This chapter presents a behavioral mechanism of control in order to break the complexity of multi-robot control systems. Specifically, this chapter proposes a Hierarchical Action Selection Process (HASP) which aims to coordinate a set of elementary controllers endowed in behavioral control architectures. This process allows at the scale of the robot to coordinate in a hierarchical and flexible ...
متن کاملDetecting and Correcting Morpho-syntactic Errors in Real Texts
This paper presents a system which detects and corrects morpho-syntactic errors in Dutch texts. It includes a spelling corrector and a shift-reduce parser for Augmented Context-free Grammars. The spelling corrector is based on trigram and triphone analysis. The parser is an extension of the well-known Tomita algorithm (Tomita, 1986). The parser interacts with the spelling corrector and handles ...
متن کامل