Tools for Collocation Extraction: Preferences for Active vs. Passive

نویسندگان

  • Ulrich Heid
  • Marion Weller
چکیده

We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision evaluation, on administrative texts from the European Union: we find a considerable amount of specialized collocations, as well as general ones and complex predicates; overall the precision is considerably higher than that of a statistical extractor used as a baseline.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tax Avoidance and Institutional Ownership: Active vs. Passive Ownership

Income tax is one of the most important costs of companies and it is usually considered as a cost that should not be paid. One of the most noticeable and influential factors in tax avoidance is corporate ownership structure. With an emphasis on institutional ownership and its types in this paper, it is attempted to measure the effect of this ownership and its types on corporate tax avoidance. F...

متن کامل

Automatic Collocation Extraction and Classification of Automatically Obtained Bigrams

This paper focuses on automatic determination of the distributional preferences of words in Russian. We present the comparison of six different measures for collocation extraction, part of which are widely known, while others are less prominent or new. For these metrics we evaluate the semantic stability of automatically obtained bigrams beginning with singletoken prepositions. Manual annotatio...

متن کامل

Automatic Term and Collocation Extraction from English-Croatian corpus

Term and collocation bases represent valuable additional resources covering specific domain and frequently expressions, which then can be used in further research. The paper presents possible model of building terminology and collocation base, using statistical and linguistic approaches in order to gain experience in building of such resources for the English Croatian language pair. The aim of ...

متن کامل

Identifying Morphosyntactic Preferences in Collocations

In this paper, we describe research that aims to make evidence on the morphosyntactic preferences of collocations available to lexicographers. Our methods for the extraction of appropriate frequency data and its statistical analysis are applied to the number and case preferences of German adjective+noun combinations in a small case study.

متن کامل

Prevalence of Active and Passive Smoking among Adult Population: Findings ‎of a Population-Based Survey in Kerman (KERCADR), Iran

Background: Smoking is one of the major modifiable non-communicable disease risk factors. Our aim was to report the pattern of active and passive smoking using the data collected through a population base household survey in Kerman, Iran. Methods: Given a cluster random sampling design, we recruited 5900 adult populations (15-75 years old) into a survey. After consenting, every participant was ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008