Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus

نویسندگان

  • Na-Rae Han
  • Martin Chodorow
  • Claudia Leacock
چکیده

One of the most difficult challenges faced by non-native speakers of English is mastering the system of English articles. We trained a maximum entropy classifier to select among a/an, the, or zero article for noun phrases, based on a set of features extracted from the local context of each. When the classifier was trained on 6 million noun phrases, its performance was correct about 88% of the time. We also used the classifier to detect article errors in the TOEFL essays of native speakers of Chinese, Japanese, and Russian. Agreement with human annotators was about 88% (kappa = 0.36). Many of the disagreements were due to the classifier s lack of discourse information. Performance rose to 94% agreement (kappa = 0.47) when the system accepted noun phrases as correct in cases where its own confidence was low.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting errors in English article usage by non-native speakers

One of the most difficult challenges faced by non-native speakers of English is mastering the system of English articles. We trained a maximum entropy classifier to select among a/an, the, or zero article for noun phrases (NPs), based on a set of features extracted from the local context of each. When the classifier was trained on 6 million NPs, its performance on published text was about 83% c...

متن کامل

A Maximum Entropy Approach to Chinese Spelling Check

Spelling check identifies incorrect writing words in documents. For the reason of input methods, Chinese spelling check is much different from English and it is still a challenging work. For the past decade years, most of the methods in detecting errors in documents are lexicon-based or probability-based, and much progress are made. In this paper, we propose a new method in Chinese spelling che...

متن کامل

Error Detection in Broadcast News ASR Using Markov Chains

This article addresses error detection in broadcast news automatic transcription, as a post-processing stage. Based on the observation that many errors appear in bursts, we investigated the use of Markov Chains (MC) for their temporal modelling capabilities. Experiments were conducted on a large Amercian English broadcast news corpus from NIST. Common features in error detection were used, all ...

متن کامل

Training Paradigms for Correcting Errors in Grammar and Usage

This paper proposes a novel approach to the problem of training classifiers to detect and correct grammar and usage errors in text by selectively introducing mistakes into the training data. When training a classifier, we would like the distribution of examples seen in training to be as similar as possible to the one seen in testing. In error correction problems, such as correcting mistakes mad...

متن کامل

Using Mostly Native Data to Correct Errors in Learners' Writing: A Meta-Classifier Approach

We present results from a range of experiments on article and preposition error correction for non-native speakers of English. We first compare a language model and errorspecific classifiers (all trained on large English corpora) with respect to their performance in error detection and correction. We then combine the language model and the classifiers in a meta-classification approach by combin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004