Collocation Mining: Exploiting Corpora for Collocation, Identification and Representation
نویسنده
چکیده
The work presented provides computational linguistics methods and tools for collocation identiication from arbitrary text, and methods and tools for representing collocations in a relational database integrating competence (collocation-type-speciic linguistic analysis) and performance information (corpus sentences). The work diiers from existing approaches to collo-cation identiication in systematically utilizing collo-cation type-speciic linguistic information. With respect to collocation representation, the work is the rst to systematically and in a large scale combining competence-based descriptions of collocations with actual occurrences in text.
منابع مشابه
Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction
Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning” ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations man...
متن کاملDomain Collocation Identification
In this paper we present a new method of automatic collocation identification. Collocation is an important relation between words, which is widely used, among others, in information retrieval tasks. Over the last years, many methods of automatic collocation acquisition from text corpora have been proposed. The approach described in this paper differs from the others by focusing on domain colloc...
متن کاملStatistical Identification of Collocations in Large Corpora for Information Retrieval
The linguistic phenomenon of collocation, the habitual juxtaposition of some words in natural language has been shown to benefit natural language processing tasks such as information retrieval. This paper examines the utility of several methods for collocation extraction for document retrieval, specifically for queries in question form.
متن کاملCollocation Translation Acquisition Using Monolingual Corpora
Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...
متن کاملSpatial association analysis: A literature review
The immense explosion of geographically referenced data calls for efficient discovery of spatial knowledge. Spatial association analysis is a typical data mining approach for discovering spatial knowledge. Associate rules are patterns of form X→Y, where pattern Y is likely to occur when pattern X occurs. One of the most famous patterns, Diapers → Beer, is a typical association rule example. Spa...
متن کامل