Automatic Semantic Classification of German Preposition Types: Comparing Hard and Soft Clustering Approaches across Features

نویسندگان

  • Maximilian Köper
  • Sabine Schulte im Walde
چکیده

This paper addresses an automatic classification of preposition types in German, comparing hard and soft clustering approaches and various windowand syntax-based co-occurrence features. We show that (i) the semantically most salient preposition features (i.e., subcategorised nouns) are the most successful, and that (ii) soft clustering approaches are required for the task but reveal quite different attitudes towards predicting ambiguity.

منابع مشابه

Exploring Soft-Clustering for German (Particle) Verbs across Frequency Ranges

In this paper we explore the role of verb frequencies and the number of clusters in soft-clustering approaches as a tool for automatic semantic classification. Relying on a large-scale setup including 4,871 base verb types and 3,173 complex verb types, and focusing on synonymy as a taskindependent goal in semantic classification, we demonstrate that low-frequency German verbs are clustered sign...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Disambiguation of the Semantics of German Prepositions: a Case Study

In this paper, we describe our experiments in preposition disambiguation based on a – compared to a previous study – revised annotation scheme and new features derived from a matrix factorization approach as used in the field of distributional semantics. We report on the annotation and Maximum Entropy modelling of the word senses of two German prepositions, mit (‘with’) and auf (‘on’). 500 occu...

متن کامل

Semantic Preserving Data Reduction using Artificial Immune Systems

Artificial Immune Systems (AIS) can be defined as soft computing systems inspired by immune system of vertebrates. Immune system is an adaptive pattern recognition system. AIS have been used in pattern recognition, machine learning, optimization and clustering. Feature reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encoun...

متن کامل

Detection and Classification of Breast Cancer in Mammography Images Using Pattern Recognition Methods

Introduction: In this paper, a method is presented to classify the breast cancer masses according to new geometric features. Methods: After obtaining digital breast mammogram images from the digital database for screening mammography (DDSM), image preprocessing was performed. Then, by using image processing methods, an algorithm was developed for automatic extracting of masses from other norma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016