Correction of Noisy Sentences using a Monolingual Corpus
نویسنده
چکیده
Correction of Noisy Natural Language Text is an important and well studied problem in Natural Language Processing. It has a number of applications in domains like Statistical Machine Translation, Second Language Learning and Natural Language Generation. In this work, we consider some statistical techniques for Text Correction. We define the classes of errors commonly found in text and describe algorithms to correct them. The data has been taken from a poorly trained Machine Translation system. The algorithms use only a language model in the target language in order to correct the sentences. We use phrase based correction methods in both the algorithms. The phrases are replaced and combined to give us the final corrected sentence. We also present the methods to model different kinds of errors, in addition to results of the working of the algorithms on the test set. We show that one of the approaches fail to achieve the desired goal, whereas the other succeeds well. In the end, we analyze the possible reasons for such a trend in performance. Chapter
منابع مشابه
Improving Translation Fluency with Search-Based Decoding and a Monolingual Statistical Machine Translation Model for Automatic Post-Editing
The BLEU scores and translation fluency for the current state-of-the-art SMT systems based on IBM models are still too low for publication purposes. The major issue is that stochastically generated sentences hypotheses, produced through a stack decoding process, may not strictly follow the natural target language grammar, since the decoding process is directed by a highly simplified translation...
متن کاملAn Efficient Technique for De-Noising Sentences using Monolingual Corpus and Synonym Dictionary
We describe a method of correcting noisy output of a machine translation system. Our idea is to consider di erent phrases of a given sentence, and nd appropriate replacements of some of these from the frequently occurring similar phrases in the monolingual corpus. The frequent phrases in the monolingual corpus are indexed by a search engine. When looking for similar phrases we consider phrases ...
متن کاملCollocation Extraction Using Monolingual Word Alignment Method
Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. Then the mon...
متن کاملAutomated Whole Sentence Grammar Correction Using a Noisy Channel Model
Automated grammar correction techniques have seen improvement over the years, but there is still much room for increased performance. Current correction techniques mainly focus on identifying and correcting a specific type of error, such as verb form misuse or preposition misuse, which restricts the corrections to a limited scope. We introduce a novel technique, based on a noisy channel model, ...
متن کاملMethod for Retrieving a Similar Sentence and Its Application to Machine Translation
In this paper, we propose incorporating similar sentence retrieval in machine translation to improve the translation of hard-to-translate input sentences. If a given input sentence is hard to translate, a sentence similar to the input sentence is retrieved from a monolingual corpus of translatable sentences and then provided to the MT system instead of the original sentence. This method is adva...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1105.4318 شماره
صفحات -
تاریخ انتشار 2011