Olli Virmajoki Pairwise Nearest Neighbor Method Revisited
نویسندگان
چکیده
The pairwise nearest neighbor (PNN) method, also known as Ward's method belongs to the class of agglomerative clustering methods. The PNN method generates hierarchical clustering using a sequence of merge operations until the desired number of clusters is obtained. This method selects the cluster pair to be merged so that it increases the given objective function value least. The main drawback of the PNN method is its slowness because the time complexity of the fastest known exact implementation of the PNN method is lower bounded by (N), where N is the number of data objects. We consider several speed-up methods for the PNN method in the first publication. These methods maintain the precision of the method. Another method for speeding-up the PNN method is investigated in the second publication, where we utilize a k-neighborhood graph for reducing distance calculations and operations. A remarkable speed-up is achieved at the cost of slight increase in distortion. The PNN method can also be adapted for multilevel thresholding, which can be seen as a 1-dimensional special case of the clustering problem. In the third publication, we show how this can be implemented efficiently using only O(N logN) time, in comparison to a straightforward approach that requires O(N). The merge philosophy is extended, by using the iterative shrinking method, in the fourth publication. In the merge phase of the PNN method, the two nearest clusters are always joined. Instead of this, we assign data objects to the neighboring clusters that they belong to. In this way, we get better clustering results; however, the results come at the cost of an increase in the running time. The proposed method is also used as a crossover method in a genetic algorithm, which produces the best clustering results in respect of the minimization of intra cluster variance. The PNN algorithm can also be applied to generating optimal clustering. In the fifth publication, we use a branch-and-bound technique for finding the best possible
منابع مشابه
Fast pairwise nearest neighbor based algorithm for multilevel thresholding
We propose a fast pairwise nearest neighbor (PNN)based O(N log N) time algorithm for multilevel nonparametric thresholding, where N denotes the size of the image histogram. The proposed PNN-based multilevel thresholding algorithm is considerably faster than optimal thresholding. On a set of 8 to 16 bits-per-pixel real images, experimental results also reveal that the proposed method provides be...
متن کاملIterative shrinking method for generating clustering
The pairwise nearest neighbor method (PNN) generates the clustering of a given data set by a sequence of merge steps. In this paper, we propose an alternative solution for the mergebased approach by introducing an iterative shrinking method. The new method removes the clusters iteratively one by one until the desired number of clusters is reached. Instead of merging two nearby clusters, we remo...
متن کاملPractical methods for speeding-up the pairwise nearest neighbor method
Timo Kaukoranta University of Turku Turku Center for Computer Science Department of Computer Science Lemminkäisenkatu 14A FIN-20520 Turku, Finland Abstract. The pairwise nearest neighbor (PNN) method is a simple and well-known method for codebook generation in vector quantization. In its exact form, it provides a good-quality codebook but at the cost of high run time. A fast exact algorithm was...
متن کاملFast PNN-based Clustering Using K-nearest Neighbor Graph
Search for nearest neighbor is the main source of computation in most clustering algorithms. We propose the use of nearest neighbor graph for reducing the number of candidates. The number of distance calculations per search can be reduced from O(N) to O(k) where N is the number of clusters, and k is the number of neighbors in the graph. We apply the proposed scheme within agglomerative clusteri...
متن کاملFast and space efficient PNN algorithm with delayed distance calculations
Clustering of a data set can be done by the well-known Pairwise Nearest Neighbor (PNN) algorithm. The algorithm is conceptionally very simple and gives high quality solutions. A drawback of the method is the relatively large running time of the original (exact) implementation. Recently, an efficient version of the exact PNN algorithm has been introduced in literature. In this paper we give a fa...
متن کامل