Towards Efficient and Improved Hierarchical Clustering With Instance and Cluster Level Constraints

نویسندگان

  • Ian Davidson
  • S. S. Ravi
چکیده

Many clustering applications use the computationally efficient non-hierarchical clustering techniques such as k-means. However, less efficient hierarchical clustering is desirable as by creating a dendrogram the user can choose an appropriate value of k (the number of clusters) and in some domains cluster hierarchies (i.e. clusters within other clusters) naturally exist. In many situations apriori constraints/information are available such as in the form of a small amount of labeled data. In this paper we explore using constraints to improve the efficiency of agglomerative clustering algorithms. We show that just finding feasible (satisfying all constraints) solutions for some constraint combinations is NP-complete and should be avoided. For a given set of constraints we derive upper (kmax) and lower bounds (kmin) on the value of k where feasible solutions exist. This allows a restricted dendrogram to be created but its creation is not straight-forward. For some combinations of constraints, starting with a feasible clustering solution (k = r) and joining the two closest clusters results in a “dead-end” feasible solution which cannot be further refined to create a feasible solution with r − 1 clusters even though kmin < r − 1 < kmax. For such situations we introduce constraint driven hierarchical clustering algorithms that will create a complete dendrogram. When traditional algorithms can be used, we illustrate the use of the triangle inequality and a newly defined γ constraint to further improve performance and use the Markov inequality to bound the expected performance improvement. Preliminary results indicate that using constraints can improve the dendrogram quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MLCA: A Multi-Level Clustering Algorithm for Routing in Wireless Sensor Networks

Energy constraint is the biggest challenge in wireless sensor networks because the power supply of each sensor node is a battery that is not rechargeable or replaceable due to the applications of these networks. One of the successful methods for saving energy in these networks is clustering. It has caused that cluster-based routing algorithms are successful routing algorithm for these networks....

متن کامل

On the Comparison of Semi-Supervised Hierarchical Clustering Algorithms in Text Mining Tasks

Semi-supervised clustering approaches have emerged as an option for enhancing clustering results. These algorithms use external information to guide the clustering process. In particular, semi-supervised hierarchical clustering approaches have been explored in many fields in the last years. These algorithms provide efficient and personalized hierarchical overviews of datasets. To the best of th...

متن کامل

Clustering Trees with Instance Level Constraints

Constrained clustering investigates how to incorporate domain knowledge in the clustering process. The domain knowledge takes the form of constraints that must hold on the set of clusters. We consider instance level constraints, such as must-link and cannot-link. This type of constraints has been successfully used in popular clustering algorithms, such as k-means and hierarchical agglomerative ...

متن کامل

A Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)

Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...

متن کامل

Creating a Cluster Hierarchy under Constraints of a Partially Known Hierarchy

Although clustering under constraints is a current research topic, a hierarchical setting, in which a hierarchy of clusters is the goal, is usually not considered. This paper tries to fill this gap by analyzing a scenario, where constraints are derived from a hierarchy that is partially known in advance. This scenario can be found, e.g., when structuring a collection of documents according to a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005