A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora
نویسندگان
چکیده
Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably.
منابع مشابه
How Good are Typological Distances for Determining Genealogical Relationships among Languages?
The recent availability of typological databases such as World Atlas of Language Structures (WALS) has spurred investigations regarding their utility for language classification, the stability of typological features in genetic linguistics and typological universals across the language families of the world. Existing work on building NLP resources such as parallel corpora, treebanks for under-r...
متن کاملEntropy Rate Estimates for Natural Language - A New Extrapolation of Compressed Large-Scale Corpora
One of the fundamental questions about human language is whether its entropy rate is positive. The entropy rate measures the average amount of information communicated per unit time. The question about the entropy of language dates back to experiments by Shannon in 1951, but in 1990 Hilberg raised doubt regarding a correct interpretation of these experiments. This article provides an in-depth e...
متن کاملCognitive Task Complexity and Iranian EFL Learners’ Written Linguistic Performance across Writing Proficiency Levels
Recently tasks, as the basic units of syllabi, and the cognitive complexity, as the criterion for sequencing them, have caught many second language researchers’ attention. This study sought to explore the effect of utilizing the cognitively simple and complex tasks on high- and low-proficient EFL Iranian writers’ linguistic performance, i.e., fluency, accuracy, lexical complexity, and structura...
متن کاملSociolinguistic Typology and Sign Languages
This paper examines the possible relationship between proposed social determinants of morphological 'complexity' and how this contributes to linguistic diversity, specifically via the typological nature of the sign languages of deaf communities. We sketch how the notion of morphological complexity, as defined by Trudgill (2011), applies to sign languages. Using these criteria, sign languages ap...
متن کاملThe Effect of Task Complexity on EFL Learners’ Narrative Writing Task Performance
This study examined the effects of task complexity on written narrative production under different task complexity conditions by EFL learners at different proficiency levels. Task complexity was manipulated along Robinson’s (2001b) proposed task complexity dimension of Here-and-Now (simple) vs. There-and-Then (complex) in. Accordingly, three specific measures of the written narratives were targ...
متن کامل