Corpus-based Metonymy Analysis 1 Running head: Corpus-based Metonymy Analysis Corpus-based Metonymy Analysis
نویسندگان
چکیده
In this paper we make the case for corpus-based metonymy analysis and show that many interesting linguistic and statistical questions can only be answered by working with real texts. To facilitate such studies, we present a method for annotating metonymies in domain and genre-independent text. We advocate an annotation scheme that builds on regularities in metonymic usage, that takes underspecification in metonymic reference into account and that is organised hierarchically. We combine previous metonymy classification proposals with insights from a corpus study to present a fully worked-out annotation scheme for location names, illustrating the above principles. We present several experiments measuring annotation agreement and show that the annotation scheme is reliable and has wide coverage. We also provide a gold standard for annotations of this kind consisting of 2000 annotated occurrences of country names in the British National Corpus. We use the resulting corpus to study metonymy distributions and the factors that influence the choice of literal vs. metonymic readings in real texts. Corpus-based Metonymy Analysis 3 Corpus-based Metonymy Analysis Metonymy is a form of figurative speech, in which one expression is used to refer to the standard referent of a related one (Lakoff & Johnson, 1980). So, in (1) ‘‘He was shocked by Vietnam.” “Vietnam”, the name of a location, refers to an event (a war) that happened there. In (2) ‘‘The ham sandwich is waiting for his check.” “ham sandwich” refers to the customer who ordered the sandwich (Lakoff & Johnson, 1980; Stallard, 1993). As metonymy has generated considerable interest in linguistics (Stern, 1931; Lakoff & Johnson, 1980; Nunberg, 1995; Pustejovsky, 1995; Panther & Radden, 1999), by now, several characteristics of metonymies have been brought to light and interesting claims have been made: 1. Metonymic readings are very systematic: for example, location names can be productively used to refer to an associated event (see Example (1) and similar examples like Woodstock). Linguistic studies (Stern, 1931; Lakoff & Johnson, 1980; Fass, 1997) have therefore postulated conventionalised metonymic patterns (e.g. place-for-event) that operate on semantic classes (here, locations). 2. Unconventional metonymies (see Example (2)) can be created on the fly. Their interpretation is context-dependent. 3. Metonymy seems to be quite frequent. However, the insights above stem from studies that are based mainly on linguistic intuition, instead of corpus studies, and are often biased to make a particular point of Corpus-based Metonymy Analysis 4 interest (for example, stressing metonymic patterns over unconventional metonymies or vice versa). Therefore, such studies are only illustrated by small sets of especially selected and/or constructed examples, cover only a limited range of what might be encountered in real-world texts and do not necessarily provide an accurate picture of the actual distribution of phenomena. Specifically, they leave the following questions still unanswered: 1. What is the actual distribution of literal readings, conventional metonymic patterns and unconventional metonymies in real-world texts? 2. Which factors (e.g., text type or word-specific behaviour) influence the distribution of metonymies? 3. How valid are the pattern lists proposed in the literature, regarding their coverage and granularity when applied to real text? The well-formedness of the above questions relies on the assumption that literal and metonymic readings as well as different metonymic patterns can be reliably distinguished. The clear-cut examples normally given in the literature hide the fact that such distinctions might be hard to make in practice. Indeed, it is unclear whether the proposed pattern lists can serve as an annotation scheme for metonymy markup by humans. In most work on metonymy, metonymy identification is based on (undocumented) human intuitions (Stallard, 1993; Pustejovsky, 1995; Fass, 1997; Markert & Hahn, 2002, e.g.), which have proved unreliable in other areas of sense annotation (Jorgensen, 1990; Ng & Lee, 1996). In order to answer these questions a large amount of natural language data analysed for metonymy is needed, but is unfortunately not yet available. This paper describes such a corpus analysis of metonymies in English texts, centering on the following points: 1. Development of a reliable annotation scheme for literal vs. metonymic usage. The annotation scheme builds on the linguistic insights mentioned so far and uses metonymic patterns defined on semantic classes for its annotation categories. We extensively tested Corpus-based Metonymy Analysis 5 reproducibility, coverage and granularity implications of this scheme. 2. Use of the developed annotation scheme to build a gold standard corpus that includes 2000 literal and metonymic examples of location names, mirroring as far as possible the original distribution in a corpus of English texts. 3. Exploration of the distribution of metonymies in the corpus. 4. Determination of some of the factors influencing the choice of literal and metonymic usage, making use of the developed gold standard corpus. In the next section we present our fully worked out annotation scheme for location names that can serve as a blueprint for annotation schemes for other semantic classes. Its reliability is rigorously evaluated in the evaluation section. We then present our gold standard corpus and discuss its distribution in the light of the above questions. As possible factors influencing metonymy distribution we discuss the information coded in a semantic class vs. the information coded in a particular lemma, the influence of textual domain and the influence of the “level” of writing. We end the paper with discussions of related work and our contributions. An Annotation Scheme for Location Names In this paper we concentrate on an annotation scheme for the semantic class of “locations”, which illustrates all properties of our general annotation framework (Markert & Nissim, 2002).
منابع مشابه
Quantitative Approaches to Metonymy
Introduction Recent years have witnessed an upsurge of interest in metonymy. From cognitive to computational linguistics, researchers have finally realized that metonymy is ubiquitous in everyday language and that it constitutes an important focus of research. In cognitive linguistics, this has given rise to detailed studies of metonymy as a cognitive phenomenon (Kövecses and Radden, 1998; Peir...
متن کاملKeeping an eye on the data : metonymies and their patterns
This paper outlines a corpus-based method for the analysis of metonymic expressions based on a series of quantitative and qualitative analyses. While an intuitive approach to metonymy successfully identifies lexical items which have metonymic extensions, intuition alone cannot settle the question how these extensions map onto linguistic form. Consider the expression set all hearts on fire, whic...
متن کاملMetonymy Interpretation Using X NO Y Examples
We developed on example-based method of metonymy interpretation. One advantages of this method is that a hand-built database of metonymy is not necessary because it instead uses examples in the form “Noun X no Noun Y (Noun Y of Noun X).” Another advantage is that we will be able to interpret newly-coined metonymic sentences by using a new corpus. We experimented with metonymy interpretation and...
متن کاملMetonymy Resolution as a Classification Task
We reformulate metonymy resolution as a classification task. This is motivated by the regularity of metonymic readings and makes general classification and word sense disambiguation methods available for metonymy resolution. We then present a case study for location names, presenting both a corpus of location names annotated for metonymy as well as experiments with a supervised classification a...
متن کاملWhere does metonymy stop? Senses, facets and active zones
Within the framework of Cognitive Linguistics both metonymy and metaphor are seen as pervasive phenomena in thought and language. They are, however, different conceptualizations of experience. Metaphorization, such as ‘Love is a journey’, involves the mapping between a source and a target in different ontological domains. Metaphorization is based on a relation of configurational sameness (PATH)...
متن کامل