Arabic Native Language Identification
نویسندگان
چکیده
In this paper we present the first application of Native Language Identification (NLI) to Arabic learner data. NLI, the task of predicting a writer’s first language from their writing in other languages has been mostly investigated with English data, but is now expanding to other languages. We use L2 texts from the newly released Arabic Learner Corpus and with a combination of three syntactic features (CFG production rules, Arabic function words and Part-of-Speech n-grams), we demonstrate that they are useful for this task. Our system achieves an accuracy of 41% against a baseline of 23%, providing the first evidence for classifier-based detection of language transfer effects in L2 Arabic. Such methods can be useful for studying language transfer, developing teaching materials tailored to students’ native language and forensic linguistics. Future directions are discussed.
منابع مشابه
Perceptual confusions of American-English vowels and consonants by native Arabic bilinguals.
This study investigated the perception of American-English (AE) vowels and consonants by young adults who were either (a) early Arabic-English bilinguals whose native language was Arabic or (b) native speakers of the English dialects spoken in the United Arab Emirates (UAE), where both groups were studying. In a closed-set format, participants were asked to identify 12 AE vowels presented in /h...
متن کاملFormulation of Language Teachers̕ Identity in the Situated Learning of Language Teaching Community of Practice
A community of practice may shape and reshape the identity of members of the community through providing them with situated learning or learning environment. This study, therefore, is to clarify the salient learning-based features of the language teaching community of practice that might formulate the identity of language teachers. To this end, the study examined how learning situations in two ...
متن کاملVocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech
We propose in this work a novel acoustic phonetic study for Arabic people suffering from language disabilities and non-native learners of Arabic language to classify Arabic continuous speech to pathological or healthy and to identify phonemes that pose pronunciation problems (case of pathological speeches). The main idea can be summarized in comparing between the phonetic model reference to Ara...
متن کاملString Kernels for Native Language Identification: Insights from Behind the Curtains
The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification (NLI). The approach obtained state-of-the-art results by combining several string kernels using...
متن کاملPatterns of Misperception of Arabic Consonants
There has been much investigation into perception of speech sounds, demonstrating a range of influences including listeners’ native language (e.g. Cutler et al. 2004), the sounds’ position in the syllable (e.g. Wang and Bilger 1973), and the presence of different types of masking noise (e.g. Phatak, Lovitt, and Allen 2008). However, there is no data on patterns of misperception of guttural cons...
متن کامل