Computing Idioms Frequency in Text Corpora

نویسنده

  • Jan Busta
چکیده

The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Towards automatic retrieval of idioms in

The goal of this paper is to present a procedure for the automatic retrieval of idiomatic expressions from large text corpora. The procedure combines text segmentation techniques and Latent Semantic Analysis (Landauer, Foltz, Laham, 1998). Three indices were computed on the basis of the three-fold hypothesis that a) idiomatic expressions should have few neighbours, that b) idiomatic expressions...

متن کامل

Enhanced Phraseological Idiomaticity in Chinese Translational Texts: A Corpus-Based Study of Chinese Four-Character Idioms in Translational and Non-Translational Literal Texts

The aim of a corpus-based approach to the study of Chinese idioms in translational and non-translational texts is to testify the preliminary hypothesis regarding the remarkable use of typical four-character expressions, especially idioms and collocations in Chinese translational texts, which has been conceived and developed largely from my Ph.D. dissertation on a corpus-based study of four-char...

متن کامل

Strategies Employed in Translation of Idioms in English Subtitles of Two Persian Television Series

Translation of idioms seems to be complicated for most translators since the meaning of idioms is difficult and sometimes impossible to be deduced from the meaning of their individual components. Considering the difficulties of translation of idioms and also the specific constraints of subtitling such as space and time limits, this research studied the strategies employed in translation of idio...

متن کامل

Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research

This paper describes a project aimed at converting a legacy representation of English idioms into an XML-based format. The project is set in the context of a large electronic English-Polish dictionary which contains several hundred formalized idiom descriptions and which has been released under the terms of a free license. In short, the project consists of three phases: cleaning up the dictiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008