Clustering based approach extracting collocations
نویسندگان
چکیده
The following study presents a collocation extraction approach based on clustering technique. This study uses a combination of several classical measures which cover all aspects of a given corpus . It suggests separating bigrams found in the corpus in several disjoint groups according to the probability of presence of collocations. This will allow excluding groups where the presence of collocations is very unlikely and thus reducing in a meaningful way the search space . Keywords—Natural Language Processing, Collocation, Clustering, Hypothesis Testing, Mutual Information
منابع مشابه
Extracting Arabic Collocations Based on Jape Rules
The massive amount of digital information available in all disciplines has generated a critical need to organize and structure their content. Among the existing tools for languages such as English or French can easily be adapted to Arabic language. In some cases a simple configuration is sufficient while in other cases significant modifications must be made to obtain acceptable results. We pres...
متن کاملCollocational Translation Memory Extraction Based on Statistical and Linguistic Information
In this paper, we propose a new method for extracting bilingual collocations from a parallel corpus to provide phrasal translation memories. The method integrates statistical and linguistic information to achieve effective extraction of bilingual collocations. The linguistic information includes parts of speech, chunks, and clauses. The method involves first obtaining an extended list of Englis...
متن کاملExtracting collocations and their translations from parallel corpora
Identifying collocations in a text (e.g., break record) and correctly translating them (battre record vs. *casser record) represent key issues in machine translation, notably because of their prevalence in language and their syntactic flexibility. This article describes a method for discovering translation equivalents for collocations from parallel corpora, aimed at increasing the lexical cover...
متن کاملExtracting Verb-Noun Collocations from Text
In this paper, we describe a new method for extracting monolingual collocations. The method is based on statistical methods extracts. VN collocations from large textual corpora. Being able to extract a large number of collocations is very critical to machine translation and many other application. The method has an element of snowballing in it. Initially, one identifies a pattern that will prod...
متن کاملRetrieving Collocations by Co-occurrences and Word Order Constraints
In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages: 1) extracting strings of characters as units of collocations 2) extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method, various range of collocations, especially...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1207.2714 شماره
صفحات -
تاریخ انتشار 2012