text correction

Extended HMM and Ranking Models for Chinese Spelling Correction

2014

Jinhua Xiong Qiao Zhang Jianpeng Hou Qianbo Wang Yuanzhuo Wang Xueqi Cheng

Spelling correction has been studied for many decades, which can be classified into two categories: (1) regular text spelling correction, (2) query spelling correction. Although the two tasks share many common techniques, they have different concerns. This paper presents our work on the CLP-2014 bake-off. The task focuses on spelling checking on foreigner Chinese essays. Compared to online sear...

متن کامل

Statistical denormalization for Arabic text

2012

Mohammed Moussa Mohammed Fakhr Kareem Darwish

In this paper, we focus on a sub-problem of Arabic text error correction, namely Arabic Text Denormalization. Text Denormalization is considered an important post-processing step when performing machine translation into Arabic. We examine different approaches for denormalization via the use of language modeling, stemming, and sequence labeling. We show the effectiveness of different approaches ...

متن کامل

Language Independent Text Correction using Finite State Automata

2008

Ahmed Hassan Awadallah Sara Noeman Hany Hassan

Many natural language applications, like machine translation and information extraction, are required to operate on text with spelling errors. Those spelling mistakes have to be corrected automatically to avoid deteriorating the performance of such applications. In this work, we introduce a novel approach for automatic correction of spelling mistakes by deploying finite state automata to propos...

متن کامل

Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System

2010

Na-Rae Han Joel R. Tetreault Soo-Hwa Lee Jin-Young Ha

This paper presents research on building a model of grammatical error correction, for preposition errors in particular, in English text produced by language learners. Unlike most previous work which trains a statistical classifier exclusively on well-formed text written by native speakers, we train a classifier on a large-scale, error-tagged corpus of English essays written by EFL learners, rel...

متن کامل

Phonetically Distributed Continuous Speech Corpus for Thai Language

2002

Chai Wutiwiwatchai Patcharika Cotsomrong Sinaporn Suebvisai Supphanat Kanokphara

This paper proposes a work on phonetically balanced sentence (PB) and phonetically distributed sentence (PD) set, which are parts of the text prompt for speech recording in Large Vocabulary Continuous Speech Recognition (LVCSR) corpus for Thai language. Firstly, a protocol of Thai phonetic transcription and some essential rules of phonetic correction after grapheme-to-phoneme (G2P) process are ...

متن کامل

A Preprocessing Model for Hand-written Arabic Texts Based on Voronoi Diagrams

2016

Atallah M. Al-Shatnawi

In this paper, a preprocessing model for hand-written Arabic text on the basis of the Voronoi Diagrams (VDs) is presented and discussed. The proposed VD-based pre-processing model consists of five stages: a preparatory stage, page segmentation, thinning, baseline estimation, and slanting correction. In the preparatory stage, the text image is converted via VDs into a group of geometrical forms ...

متن کامل

CUFE$@$QALB-2015 Shared Task: Arabic Error Correction System

2015

Michael Nawar

In this paper we describe the implementation of an Arabic error correction system developed for the WANLP-2015 shared task on automatic error correction for Arabic text. We proposed improvements to a previous statistical rule based system, where we use the words patterns to improve the error correction, also we have used a statistical system the syntactic error correction rules. The system achi...

متن کامل

Correcting Errors in a New Gold Standard for Tagging Icelandic Text

2014

Sigrún Helgadóttir Hrafn Loftsson Eiríkur Rögnvaldsson

In this paper, we describe the correction of PoS tags in a new Icelandic corpus, MIM-GOLD, consisting of about 1 million tokens sampled from the Tagged Icelandic Corpus, MÍM, released in 2013. The goal is to use the corpus, among other things, as a new gold standard for training and testing PoS taggers. The construction of the corpus was first described in 2010 together with preliminary work on...

متن کامل

Book Cover Recognition

2016

Linfeng Yang Xinyu Shen

Here we developed a MATLAB based Graphical User Interface for people to check the information of desired books in real-time. The GUI allows user to take photos of the book cover. Then it will automatically detect features of the input image based on MSER algorithm, then it will filter out non-text features based on morphological difference between text and non-text regions. In order to further ...

متن کامل

Using Error-Annotated ESL Data to Develop an ESL Error Correction System

2010

Na-Rae Han Joel Tetreault Soo-Hwa Lee Jin-Young Ha

This paper presents research on building a model of grammatical error correction, for preposition errors in particular, in English text produced by language learners. Unlike most previous work which trains a statistical classifier exclusively on well-formed text written by native speakers, we train a classifier on a large-scale, error-tagged corpus of English essays, relying on contextual and g...

متن کامل