A Clustering Algorithm For Chinese Adjectives And Nouns
نویسندگان
چکیده
This paper proposes a bidirctional hierarchical clustering algorithm for simultaneously clustering words of different parts of speech based on collocations. The algorithm is composed of cycles of two kinds of alternate clustering processes. We construct an objective function based on Minimum Description Length. To. partly solve the problem caused by sparse data two concepts of collocational degree and revisional distance are presented.
منابع مشابه
Exploring the value space of attributes: Unsupervised bidirectional clustering of adjectives in German
The paper presents an iterative bidirectional clustering of adjectives and nouns based on a cooccurrence matrix. The clustering method combines a Vector Space Models (VSM) and the results of a Latent Dirichlet Allocation (LDA), whose results are merged in each iterative step. The aim is to derive a clustering of German adjectives that reflects latent semantic classes of adjectives, and that can...
متن کاملThe Other Pole of Degree Modification of Gradable Nouns by Size Adjectives: A Mandarin Chinese Perspective
Size adjectives can have degree readings when they modify gradable nouns. However, a cross-linguistic variation exists with respect to what type(s) of size adjectives in a particular language can have such readings. In English degree readings are available only for size adjectives that predicate bigness, and in Mandarin Chinese degree readings are available for all size adjectives irrespective ...
متن کاملSemantic Clustering in Dutch Automatically inducing semantic classes from large-scale corpora
Handcrafting semantic classes is a difficult and time-consuming job, and depends on human interpretation. Possible machine learning techniques would be much faster, and do not rely on interpretation, because they stick to the data. The goal of this research is to present some machine learning techniques that make it possible to achieve an automatic clustering of Dutch words. More particularly, ...
متن کاملSemantic Classification of Chinese Unknown Words
This paper describes a classifier that assigns semantic thesaurus categories to unknown Chinese words (words not already in the CiLin thesaurus and the Chinese Electronic Dictionary, but in the Sinica Corpus). The focus of the paper differs in two ways from previous research in this particular area. Prior research in Chinese unknown words mostly focused on proper nouns (Lee 1993, Lee, Lee and C...
متن کاملBuilding a Chinese Lexical Taxonomy
In this paper, we present a Chinese lexical taxonomy, a hierarchically organization of Chinese lexical classes of nouns, verbs and adjectives. We first describe the structure of this taxonomy and then present the methods we used to build it. The distinctive characteristics of this lexical taxonomy are: 1) we use definition frame to describe each lexical class, as well as its members, 2) the lex...
متن کامل