Automated Grammar Correction Using Hierarchical Phrase-Based Statistical Machine Translation
نویسندگان
چکیده
We introduce a novel technique that uses hierarchical phrase-based statistical machine translation (SMT) for grammar correction. SMT systems provide a uniform platform for any sequence transformation task. Thus grammar correction can be considered a translation problem from incorrect text to correct text. Over the years, grammar correction data in the electronic form (i.e., parallel corpora of incorrect and correct sentences) has increased manifolds in quality and quantity, making SMT systems feasible for grammar correction. Firstly, sophisticated translation models like hierarchical phrase-based SMT can handle errors as complicated as reordering or insertion, which were difficult to deal with previously throuh the mediation of rule based systems. Secondly, this SMT based correction technique is similar in spirit to human correction, because the system extracts grammar rules from the corpus and later uses these rules to translate incorrect sentences to correct sentences. We describe how to use Joshua, a hierarchical phrase-based SMT system for grammar correction. An accuracy of 0.77 (BLEU score) establishes the efficacy of our approach.
منابع مشابه
Hierarchical Phrase-Based Statistical Machine Translation System
The aim of this thesis is to express fundamentals and concepts behind one of the emerging techniques in statistical machine translation (SMT) hierarchical phrase based MT by implementing translation from Hindi to English. Basically hierarchical model extends phrase based models by considering subphrases with the aid of context free grammar (CFG). In other models, syntax based models bear a rese...
متن کاملA Hierarchical Phrase-Based Model for Statistical Machine Translation
We present a statistical phrase-based translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntaxbased translation systems without any linguistic commitment. In our experiments using BLEU ...
متن کاملUsing Features from Topic Models to Alleviate Over-Generation in Hierarchical Phrase-Based Translation
In hierarchical phrase-based translation systems, the grammars (SCFG rules) have over-generation problem because we can replace the non-terminalX with almost everything without knowing the syntactic or semantic role ofX . In this paper, we present an approach that uses topic models to learn the distributions for non-terminals in each SCFG rule, based on which we further derive static features f...
متن کاملExtending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT
In this paper, we describe two approaches to extending syntactic constraints in the Hierarchical Phrase-Based (HPB) Statistical Machine Translation (SMT) model using Combinatory Categorial Grammar (CCG). These extensions target the limitations of previous syntax-augmented HPB SMT systems which limit the coverage of the syntactic constraints applied. We present experiments on Arabic–English and ...
متن کاملHierarchical Phrase-Based Translation
We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from both syntax-based translation and phrase-based translation. We describe our system’s training and ...
متن کامل