Norwegian Native Language Identification
نویسندگان
چکیده
We present a study of Native Language Identification (NLI) using data from learners of Norwegian, a language not yet used for this task. NLI is the task of predicting a writer’s first language using only their writings in a learned language. We find that three feature types, function words, part-of-speech n-grams and a hybrid part-of-speech/function word mixture n-gram model are useful here. Our system achieves an accuracy of 79% against a baseline of 13% for predicting an author’s L1. The same features can distinguish non-native writing with 99% accuracy. We also find that part-of-speech n-gram performance on this data deviates from previous NLI results, possibly due to the use of manually post-corrected tags.
منابع مشابه
Desensitization in Norwegian Vowel Perception by Native and American English Listeners
Some differences in speech perception by native and nonnative listeners can be accounted for as transference from a native language. Others appear to result from universal preferences, such as duration. In the case of L2 vowel perception, duration may be used to categorize vowels when, from a non-native listener ́s perspective, inadequate spectral cues are available. The perception of Norwegian ...
متن کاملString Kernels for Native Language Identification: Insights from Behind the Curtains
The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification (NLI). The approach obtained state-of-the-art results by combining several string kernels using...
متن کاملAutomatic evaluation of quantity contrast in non-native Norwegian speech
Computer assisted language learning (CAPT) has been shown to be effective for learning non-natives pronunciation details of a new language. No automatic pronunciation evaluation system exists for non-native Norwegian. We present initial experiments on the Norwegian quantity contrast between short and long vowels. A database of native and non-native speakers was recorded for training and test re...
متن کاملThe influence of non-native morphosyntax on the intelligibility of a closely related language
This study investigates the effect of morphosyntactic differences on our ability to comprehend a closely related language. Previous studies of mutual intelligibility, or receptive bilingualism, have focussed largely on the role of extra-linguistic, lexical, or phonetic factors. Although there is reason to believe that differences in morphology and syntax might worsen the ability to comprehend a...
متن کاملTemporal factors in the production of Norwegian as a second language: Some preliminary results
Using speech material from the project ‘Languages in Contact’ this paper presents some results on the production of Norwegian spoken by immigrants. The material investigated consists of sentences read by speakers from six different languages and a control group of Norwegian native speakers. The preliminary measurements reported on here involved speech rate, pauses, the duration ratio content/fu...
متن کامل