DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation
نویسندگان
چکیده
The Denclue algorithm employs a cluster model based on kernel density estimation. A cluster is defined by a local maximum of the estimated density function. Data points are assigned to clusters by hill climbing, i.e. points going to the same local maximum are put into the same cluster. A disadvantage of Denclue 1.0 is, that the used hill climbing may make unnecessary small steps in the beginning and never converges exactly to the maximum, it just comes close. We introduce a new hill climbing procedure for Gaussian kernels, which adjusts the step size automatically at no extra costs. We prove that the procedure converges exactly towards a local maximum by reducing it to a special case of the expectation maximization algorithm. We show experimentally that the new procedure needs much less iterations and can be accelerated by sampling based methods with sacrificing only a small amount of accuracy.
منابع مشابه
Comparative Study of Density based Clustering Algorithms
This paper presents a comparative study of three Density based Clustering Algorithms that are DENCLUE, DBCLASD and DBSCAN. Six parameters are considered for their comparison. Result is supported by firm experimental evaluation. This analysis helps in finding the appropriate density based clustering algorithm in variant situations. General Terms Algorithms .
متن کاملFast Estimation of Nonparametric Kernel Density Through PDDP, and its Application in Texture Synthesis
In thiswork, anewalgorithm isproposed for fast estimationofnonparametricmultivariate kernel density, based on principal direction divisive partitioning (PDDP) of the data space.The goal of the proposed algorithm is to use the finite support property of kernels for fast estimation of density. Compared to earlier approaches, this work explains the need of using boundaries (for partitioning the sp...
متن کاملComparison of the Gamma kernel and the orthogonal series methods of density estimation
The standard kernel density estimator suffers from a boundary bias issue for probability density function of distributions on the positive real line. The Gamma kernel estimators and orthogonal series estimators are two alternatives which are free of boundary bias. In this paper, a simulation study is conducted to compare small-sample performance of the Gamma kernel estimators and the orthog...
متن کاملAn Efficient Approach to Clustering in Large Multimedia Databases with Noise
Several clustering algorithms can be applied to clustering in large multimedia databases. The effectiveness and efficiency of the existing algorithms, however, is somewhat limited, since clustering in multimedia databases requires clustering high-dimensional feature vectors and since multimedia databases often contain large amounts of noise. In this paper, we therefore introduce a new algorithm...
متن کاملAn E cient Approach to Clustering in Large Multimedia Databaseswith
Several clustering algorithms can be applied to clustering in large multimedia databases. The eeectiveness and eeciency of the existing algorithms, however, is somewhat limited, since clustering in multimedia databases requires clustering high-dimensional feature vectors and since multimedia databases often contain large amounts of noise. In this paper , we therefore introduce a new algorithm t...
متن کامل