On “Article Omission” in German and the “Uniform Information Density Hypothesis”
نویسندگان
چکیده
This paper investigates whether Information Theory (IT) in the tradition of Shannon (1948) and in particular the “Uniform Information Density Hypothesis” (UID, see Jäger 2010) might contribute to our understanding of a phenomenon called “article omission” (AO) in the literature. To this effect, we trained language models on a corpus of 17 different text types (from prototypically written text types like legal texts to prototypically spoken text types like dialogue) with about 2.000 sentences each and compared the density profiles of minimal pairs. Our results suggest, firstly, that an overtly realized article significantly reduces the surprisal on the following head noun (as was to be expected). It also shows, however, that omitting the article results in a non-uniform distribution (thus contradicting the UID). Since empirically AO seems not to depend on specific lexical items, we also trained our language models on a more abstract level (part of speech). With respect to this level of analysis we were able to show that, again, an overtly realized article significantly reduces the surprisal on the following head noun, but at the same time AO results in a more uniform distribution of information. In the case of AO the UID thus seems to operate on the level of POS rather than on the lexical level.
منابع مشابه
Information density of encodings: The role of syntactic variation in comprehension
The Uniform Information Density (UID) hypothesis links production strategies with comprehension processes, predicting that speakers will utilize flexibility in encoding in order to increase uniformity in the rate of information transmission, as measured by surprisal (Jaeger, 2010). Evidence in support of UID comes primarily from studies focusing on word-level effects, e.g. demonstrating that su...
متن کاملUniform Information Density at the Level of Discourse Relations: Negation Markers and Discourse Connective Omission
About half of the discourse relations annotated in Penn Discourse Treebank (Prasad et al., 2008) are not explicitly marked using a discourse connective. But we do not have extensive theories of when or why a discourse relation is marked explicitly or when the connective is omitted. Asr and Demberg (2012a) have suggested an information-theoretic perspective according to which discourse connectiv...
متن کاملUniform Surprisal at the Level of Discourse Relations: Negation Markers and Discourse Connective Omission
About half of the discourse relations annotated in Penn Discourse Treebank (Prasad et al., 2008) are not explicitly marked using a discourse connective. But we do not have extensive theories of when or why a discourse relation is marked explicitly or when the connective is omitted. Asr and Demberg (2012a) have suggested an information-theoretic perspective according to which discourse connectiv...
متن کاملAcquisition and Accurate Use of English Articles by Persian Speakers
This study was conducted with the purpose of examining Persian speakers’ article acquisition and use with reference to Ionin, Ko and Wexler’s (2004) model, which is based on the prediction of Fluctuation Hypothesis (FH) that EFL learners of [-article] languages, like Persian, make erroneous article use in [+definite, -specific] and [-definite, +specific] contexts. From among the students of an ...
متن کاملA New Method for Sperm Detection in Infertility Cure: Hypothesis Testing Based on Fuzzy Entropy Decision
In this paper, a new method is introduced for sperm detection in microscopic images for infertility treatment. In this method, firstly a hypothesis testing function is defined to separate sperms from plasma, non-sperm semen particles and noise. Then, some primary candidates are selected for sperms by watershed-based segmentation algorithm. Finally, candidates are either confirmed or rejected us...
متن کامل