Characteristics of a Hierarchical Data Clustering Algorithm Based on Gravity Theory

نویسندگان

  • Yen-Jen Oyang
  • Chien-Yu Chen
  • Shien-Ching Hwang
  • Cheng-Fang Lin
چکیده

Clustering algorithms that output a hierarchical dendrogram as return are classified as hierarchical clustering algorithms. The most desirable feature of the hierarchical clustering algorithm is that a hierarchical dendrogram is generated. This feature is very important for applications such as in biological, social, and behavior studies, due to the need to construct taxonomies. One general problem of the modern hierarchical data clustering algorithms is that clustering quality highly depends on how certain parameters are set. What makes the situation even more complicated is that optimal parameter setting is data dependent. As a result, it may happen that different parts of a given data set require different parameter settings for optimizing clustering quality and applying a global parameter setting to the entire data set may ruin the final result. In such cases, parameter tuning may require human intervention, which not only is time consuming but also may become cumbersome to the user, if the dimension of the data set is high. This paper presents the main characteristics of a hierarchical clustering algorithm that overcomes the parameter-tuning problem and features favorite clustering quality. The proposed hierarchical clustering algorithm is based on gravity theory in physics. The studies presented in this paper reveal that the optimal ranges for the parameters to be set in the proposed gravity-based clustering algorithm are wide and are essentially not data dependent. Therefore, parameter tuning is essentially not required. Another major feature of the proposed gravity-based algorithm is that it enjoys favorite clustering quality in comparison with conventional hierarchical clustering algorithms that require no parameter tuning. Keyword: data clustering, hierarchical clustering, clustering quality, gravity theory. Section

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory

This paper discusses the clustering quality and complexities of the hierarchical data clustering algorithm based on gravity theory. The gravitybased clustering algorithm simulates how the given N nodes in a K-dimensional continuous vector space will cluster due to the gravity force, provided that each node is associated with a mass. One of the main issues studied in this paper is how the order ...

متن کامل

An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory

One of the main challenges in the design of modern clustering algorithms is that, in many applications, new data sets are continuously added into an already huge database. As a result, it is impractical to carry out data clustering from scratch whenever there are new data instances added into the database. One way to tackle this challenge is to incorporate a clustering algorithm that operates i...

متن کامل

روش نوین خوشه‌بندی ترکیبی با استفاده از سیستم ایمنی مصنوعی و سلسله مراتبی

Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects. AIS algor...

متن کامل

Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers

Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering  in which there is no need to  be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001