Exploiting Parse Structures for Native Language Identification

نویسندگان

  • Sze-Meng Jojo Wong
  • Mark Dras
چکیده

Attempts to profile authors according to their characteristics extracted from textual data, including native language, have drawn attention in recent years, via various machine learning approaches utilising mostly lexical features. Drawing on the idea of contrastive analysis, which postulates that syntactic errors in a text are to some extent influenced by the native language of an author, this paper explores the usefulness of syntactic features for native language identification. We take two types of parse substructure as features— horizontal slices of trees, and the more general feature schemas from discriminative parse reranking—and show that using this kind of syntactic feature results in an accuracy score in classification of seven native languages of around 80%, an error reduction of more than 30%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic detection of grammatical structures from non-native speech

This study focuses on the identification of grammatical structures that could serve as indices of the grammatical ability of non-native speakers of English. We obtain parse trees of manually transcribed non-native spoken responses using a statistical constituency parser and evaluate its performance on noisy sentences. We then use the parse trees to identify the grammatical structures of the Ind...

متن کامل

CCG parsing with one syntactic structure per n-gram

There is an inherent redundancy in natural languages whereby certain common phrases (or n-grams) appear frequently in general sentences, each time with the same syntactic analysis. We explore the idea of exploiting this redundancy by pre-constructing the parse structures for these frequent n-grams. When parsing sentences in the future, the parser does not have to re-derive the parse structure f...

متن کامل

Myanmar-English Bidirectional Machine Translation System with Numerical Particles Identification

This paper the development of MyanmarEnglish bidirectional machine translation system is implemented applying Rule based machine translation approach. Stanford and ML2KR parsers are used for preprocessing step. From this step, parsers generate corresponding parse tree structures. Used parsers generate corresponding CFG rules which are collected and created as synchronous context free grammar SC...

متن کامل

The Comprehension of Garden-Path Structures by Iranian EFL Learners

The present study sought to investigate the comprehension of Garden-Path structures by Iranian EFL learners. 50 female students of Kharazmi English Language Institute in Karaj participated in this study. All of the participants were native speakers of Persian studying in Kharazmi English language institute in Karaj, Iran. They ranged from 18 to 30 in terms of age. The participants were administ...

متن کامل

Move Structures in “Statement-of-the-Problem” Sections of M.A. Theses: The Case of Native and Nonnative Speakers of English

Understanding how to structure the “Statement-of-the-Problem” (SP) section of a thesis is necessary for EFL students to develop a logical argumentation for a problem statement. This study intended to compare Move structures of SP sections of theses written by native speakers of Persian (NSPs) and English (NSEs). To this end, 100 SP sections (50 SP sections written by NSE...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011