Improving Native Language Identification by Using Spelling Errors

نویسندگان

  • Lingzhen Chen
  • Carlo Strapparava
  • Vivi Nastase
چکیده

In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Time of Memorization and English Spelling Difficulties among Iranian EFL Students in Malaysia

AbstractIn this study, phonological, morphological, and orthographical spelling difficulties were identified to examine the correlation between spelling difficulties and the time taken to memorize the spelling of words (time of memorization) among Iranian EFL students in Malaysia. The participants were 41 Iranian EFL students (20 male and 21 female) who were selected purposefully from an Irania...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Neural Networks and Spelling Features for Native Language Identification

We present the RUG-SU team’s submission at the Native Language Identification Shared Task 2017. We combine several approaches into an ensemble, based on spelling error features, a simple neural network using word representations, a deep residual network using word and character features, and a system based on a recurrent neural network. Our best system is an ensemble of neural networks, reachin...

متن کامل

LIMSI's participation to the 2013 shared task on Native Language Identification

This paper describes LIMSI’s participation to the first shared task on Native Language Identification. Our submission uses a Maximum Entropy classifier, using as features character and chunk n-grams, spelling and grammatical mistakes, and lexical preferences. Performance was slightly improved by using a twostep classifier to better distinguish otherwise easily confused native languages.

متن کامل

Pronunciation Modeling in Spelling Correction for Writers of English as a Foreign Language

We propose a method for modeling pronunciation variation in the context of spell checking for non-native writers of English. Spell checkers, typically developed for native speakers, fail to address many of the types of spelling errors peculiar to non-native speakers, especially those errors influenced by differences in phonology. Our model of pronunciation variation is used to extend a pronounc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017