learner corpora

Detecting learning disorders in students' written production in the foreign language: Are learner corpora of any help?

Journal: :Porta Linguarum Revista Interuniversitaria de Didáctica de las Lenguas Extranjeras 2011

The IFCASL Corpus of French and German Non-native and Native Read Speech

2016

Jürgen Trouvain Anne Bonneau Vincent Colotte Camille Fauth Dominique Fohr Denis Jouvet Jeanin Jügler Yves Laprie Odile Mella Bernd Möbius Frank Zimmerer

The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the...

متن کامل

Is core vocabulary a friend or foe of academic writing? Single-word vs multi-word uses of thing

Journal: :Journal of English for Academic Purposes 2021

Core vocabulary items (e.g. thing, way) are often viewed as the enemy of effective academic writing, and style guides textbooks advise against using them. However, their bad reputation seems to stem from a single-word perspective that ignores rich phraseological units such tend figure in. In this study, we focus on core lemma thing investigate extent which approach can redeem its reputation. We...

متن کامل

Annotating foreign learners’ Czech

2010

Barbora Štindlová

One of the challenges of contemporary corpus linguistics is the compilation and annotation of corpora consisting of texts produced by non-native speakers. In addition to morphosyntactic tagging and lemmatisation, such texts can be annotated by information relevant to the specific nonstandard use. Cases of deviant language use can be corrected and identified by a tag specifying the type of the e...

متن کامل

Generalization in Native Language Identification: Learners versus Scientists

2015

Sabrina Stehwien

English. Native Language Identification (NLI) is the task of recognizing an author’s native language from text in another language. In this paper, we consider three English learner corpora and one new, presumably more difficult, scientific corpus. We find that the scientific corpus is only about as hard to model as a less-controlled learner corpus, but cannot profit as much from corpus combinat...

متن کامل

Connector Usage in the English Essay Writing of Japanese EFL Learners

2004

Masumi Narita Chieko Sato Masatoshi Sugiura

In this paper we report on our quantitative analysis of 25 logical connectors in advanced Japanese university students’ essay writing and compare it with the use in comparable types of native English writing. We also present a brief comparison of the Japanese learners’ usage with that of advanced French, Swedish or Chinese learners of English. As our research targets, we chose 25 logical connec...

متن کامل

Finding the Zone of Proximal Development: Student-Tutor Second Language Dialogue Interactions

2017

Arabella Sinclair Jon Oberlander Dragan Gasevic

The goal of dialogue practice for a second language learner is to facilitate their production of dialogue similar to that between native speakers. This paper explores the characteristics of student and tutor dialogue in terms of their differences from classic conversational and task-oriented corpora. Interlocutors have the tendency to align to the language of the other in conversational dialogu...

متن کامل

Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings

2017

Masahiro Kaneko Yuya Sakaizawa Mamoru Komachi

In this study, we improve grammatical error detection by learning word embeddings that consider grammaticality and error patterns. Most existing algorithms for learning word embeddings usually model only the syntactic context of words so that classifiers treat erroneous and correct words as similar inputs. We address the problem of contextual information by considering learner errors. Specifica...

متن کامل

Priors in Bayesian Learning of Phonological Rules

2004

Sharon Goldwater Mark Johnson

This paper describes a Bayesian procedure for unsupervised learning of phonological rules from an unlabeled corpus of training data. Like Goldsmith’s Linguistica program (Goldsmith, 2004b), whose output is taken as the starting point of this procedure, our learner returns a grammar that consists of a set of signatures, each of which consists of a set of stems and a set of suffixes. Our grammars...

متن کامل

Online Inference-Rule Learning from Natural-Language Extractions

2013

Sindhu Raghavan Raymond J. Mooney

In this paper, we consider the problem of learning commonsense knowledge in the form of first-order rules from incomplete and noisy natural-language extractions produced by an off-the-shelf information extraction (IE) system. Much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. The proposed rule learner accou...

متن کامل