Extraction of Collocations from a Text Corpus: A Fuzzy Measure
نویسندگان
چکیده
Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. A fuzzy set theoretic approach for extracting collocations from a text collection is described in this article. This approach proposed a fuzzy bi-gram index to find the bi-grams from a collection. Collocations of higher length i.e., ngrams (n>2) are then obtained using the fuzzy bi-gram index where the extracted collocations of lower lengths are treated as individual words. The performance of the proposed methods is found to be quite promising and it is better than that of other widely used methods we considered. Keywords—Collocation extraction, Fuzzy sets, Natural language processing, Corpus statistics, GENIA corpus.
منابع مشابه
Fuzzy Set Theoretic Approach To Collocation Extraction
Fuzzy approach deals with the linguistic properties of elements such as beauty, coldness, hotness etc. Collocations are linguistically motivated. Decision of word combination for being collocation is a linguistic term as merely co-occurrence of word combinations does not signify the presence of collocation. Thus collocation extraction can be made possible by looking its linguistic aspect. In th...
متن کاملThe Application of Fuzzy Logic to Collocation Extraction
Collocations are important for many tasks of Natural language processing such as information retrieval, machine translation, computational lexicography etc. So far many statistical methods have been used for collocation extraction. Almost all the methods form a classical crisp set of collocation. We propose a fuzzy logic approach of collocation extraction to form a fuzzy set of collocations in ...
متن کاملFipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora
We introduce FipsCoView, an on-line interface for dictionary-like visualisation of collocations detected from parallel corpora using a syntactically-informed extraction method.
متن کاملExtracting Academic Subjects Semantic Relations Using Collocations
The paper presents approach to analyze semantic content of academic subjects and its internal relations using statistically-based techniques for collocation extraction from large electronic educational text corpus. It offers a survey and analysis of some related corpus-based approaches to extract conceptual relations used for educational purpose and presents a technique for semantic search of c...
متن کاملLearning to Order Terms: Supervised Interestingness Measures in Terminology Extraction
Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning” ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations man...
متن کامل