Tools for Collocation Extraction: Preferences for Active vs. Passive
نویسندگان
چکیده
We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision evaluation, on administrative texts from the European Union: we find a considerable amount of specialized collocations, as well as general ones and complex predicates; overall the precision is considerably higher than that of a statistical extractor used as a baseline.
منابع مشابه
Tax Avoidance and Institutional Ownership: Active vs. Passive Ownership
Income tax is one of the most important costs of companies and it is usually considered as a cost that should not be paid. One of the most noticeable and influential factors in tax avoidance is corporate ownership structure. With an emphasis on institutional ownership and its types in this paper, it is attempted to measure the effect of this ownership and its types on corporate tax avoidance. F...
متن کاملAutomatic Collocation Extraction and Classification of Automatically Obtained Bigrams
This paper focuses on automatic determination of the distributional preferences of words in Russian. We present the comparison of six different measures for collocation extraction, part of which are widely known, while others are less prominent or new. For these metrics we evaluate the semantic stability of automatically obtained bigrams beginning with singletoken prepositions. Manual annotatio...
متن کاملAutomatic Term and Collocation Extraction from English-Croatian corpus
Term and collocation bases represent valuable additional resources covering specific domain and frequently expressions, which then can be used in further research. The paper presents possible model of building terminology and collocation base, using statistical and linguistic approaches in order to gain experience in building of such resources for the English Croatian language pair. The aim of ...
متن کاملIdentifying Morphosyntactic Preferences in Collocations
In this paper, we describe research that aims to make evidence on the morphosyntactic preferences of collocations available to lexicographers. Our methods for the extraction of appropriate frequency data and its statistical analysis are applied to the number and case preferences of German adjective+noun combinations in a small case study.
متن کاملPrevalence of Active and Passive Smoking among Adult Population: Findings of a Population-Based Survey in Kerman (KERCADR), Iran
Background: Smoking is one of the major modifiable non-communicable disease risk factors. Our aim was to report the pattern of active and passive smoking using the data collected through a population base household survey in Kerman, Iran. Methods: Given a cluster random sampling design, we recruited 5900 adult populations (15-75 years old) into a survey. After consenting, every participant was ...
متن کامل