Investigating the content and form of referring expressions in Mandarin: introducing the Mtuna corpus
نویسندگان
چکیده
East Asian languages are thought to handle reference differently from English, particularly in terms of the marking of definiteness and number. We present the first Data-Text corpus for Referring Expressions in Mandarin, and we use this corpus to test some initial hypotheses inspired by the theoretical linguistics literature. Our findings suggest that function words deserve more attention in Referring Expression Generation than they have so far received, and they have a bearing on the debate about whether different languages make different trade-offs between clarity and brevity.
منابع مشابه
Probabilistic Refinement Algorithms for the Generation of Referring Expressions
We propose an algorithm for the generation of referring expressions (REs) that adapts the approach of Areces et al. (2008, 2011) to include overspecification and probabilities learned from corpora. After introducing the algorithm, we discuss how probabilities required as input can be computed for any given domain for which a suitable corpus of REs is available, and how the probabilities can be ...
متن کاملSpatial Descriptions as Referring Expressions in the MapTask Domain
We discuss work-in-progress on a hybrid approach to the generation of spatial descriptions, using the maps of the Map Task dialogue corpus as domain models. We treat spatial descriptions as referring expressions that distinguish particular points on the maps from all other points (potential ‘distractors’). Our approach is based on rule-based overgeneration of spatial descriptions combined with ...
متن کاملLexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities
This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...
متن کاملThe Effect of Rootstocks on the Peel Phenolic Compounds of Clementine Mandarin (Citrus clementina)
Studies have shown that phenolic compounds are important in human health.The purpose of this research was to examine the influence of rootstocks on phenolic compounds. The content of individual phenolic compounds in fruits was determined by HPLC. Total flavonoids content was measured using colorimetric method. Free radical scavenging activity on stable DPPH radicals was also evaluated. HPLC ana...
متن کاملIndividual Variation in the Choice of Referential Form
This study aims to measure the variation between writers in their choices of referential form by collecting and analysing a new and publicly available corpus of referring expressions. The corpus is composed of referring expressions produced by different participants in identical situations. Results, measured in terms of normalized entropy, reveal substantial individual variation. We discuss the...
متن کامل