Like Finding a Needle in a Haystack: Annotating the American National Corpus for Idiomatic Expressions

نویسندگان

  • Laura Street
  • Nathan Michalov
  • Rachel Silverstein
  • Michael Reynolds
  • Lurdes Ruela
  • Felicia Flowers
  • Angela Talucci
  • Priscilla Pereira
  • Gabriella Morgon
  • Samantha Siegel
  • Marci Barousse
  • Antequa Anderson
  • Tashom Carroll
  • Anna Feldman
چکیده

This paper presents the details of a pilot study in which we tagged portions of the American National Corpus (ANC) for idioms composed of verb-noun constructions, prepositional phrases, and subordinate clauses. The three data sets we analyzed included 1,500-sentence samples from the spoken, the non-fiction, and the fiction portions of the ANC. This paper provides the details of the tagset we developed, the motivation behind our choices, and the inter-annotator agreement measures we deemed appropriate for this task. In tagging the ANC for idiomatic expressions, our annotators achieved a high level of agreement (> .80) on the tags but a low level of agreement (< .00) on what constituted an idiom. These findings support the claim that identifying idiomatic and metaphorical expressions is a highly difficult and subjective task. In total, 135 idiom types and 154 idiom tokens were identified. Based on the total tokens found for each idiom class, we suggest that future research on idiom detection and idiom annotation include prepositional phrases as this class of idioms occurred frequently in the nonfiction and spoken samples of our corpus

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

(Un)Translatability of Persian Idiomatic Expressions to English in Political Discourse

The present study sought to investigate the extent to which Persian idiomatic expressions would influence the western translators' strategies in providing the ultimate product in English, and it also attempted to uncover the underlying assumptions in target text, then to suggest some weighty strategies to overcome difficulties with translation. For this purpose, the data was analyzed within the...

متن کامل

Mining the Web for Idiomatic Expressions Using Metalinguistic Markers

In this paper, methods for identification and delimitation of idiomatic expressions in large Web corpora are presented. The proposed methods are based on the observation that idiomatic expressions are sometimes accompanied by metalinguistic expressions, e.g. the word “proverbial”, the expression “as they say” or quotation marks. Even though the frequency of such idiom-related metalinguistic mar...

متن کامل

The VNC-Tokens Dataset

Idiomatic expressions formed from a verb and a noun in its direct object position are a productive cross-lingual class of multiword expressions, which can be used both idiomatically and as a literal combination. This paper presents the VNC-Tokens dataset, a resource of almost 3000 English verb–noun combination usages annotated as to whether they are literal or idiomatic. Previous research using...

متن کامل

The Impact of Multimodal Channels on Teaching Idiomatic Expressions to Intermediate EFL Learners with Regard to Their Attitudes

This study was to explore facilitative function of using multimodal channels over single channel presentation and comprehension of idiomatic expressions to Iranian EFL intermediate proficiency learners. Out of a pool of 90, sixty intermediate participants were homogenized by a QPT test, using a quasi-experimental design. They were randomly assigned to three equal groups: WhatsApp-, SMS- and Cla...

متن کامل

Textuality of Idiomatic Expressions in Cameroon English

The meaning of an idiomatic expression cannot be transparently worked out from the meanings of its constituent words due to its figurative and unpredictable nature. Consequently, the syntactic composition and the structural paradigm of an idiomatic expression are supposed to be the same in every context. However, this is not the case in the institutionalized second language varieties of English...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010