partitional clustering

Partitional Clustering of Protein Sequences - An Inductive Logic Programming Approach

2009

Nuno A. Fonseca Vítor Santos Costa Rui Camacho Cristina P. Vieira Jorge Vieira

We present a novel approach to cluster sets of protein sequences, based on Inductive Logic Programming (ILP). Preliminary results show that the method proposed produces understandable descriptions/explanations of the clusters. Furthermore, it can be used as a knowledge elicitation tool to explain clusters proposed by other clustering approaches, such as standard phylogenetic programs.

متن کامل

On Controlling the Size of Clusters in Probabilistic Clustering

2017

Aditya Jitta Arto Klami

Classical model-based partitional clustering algorithms, such as k-means or mixture of Gaussians, provide only loose and indirect control over the size of the resulting clusters. In this work, we present a family of probabilistic clustering models that can be steered towards clusters of desired size by providing a prior distribution over the possible sizes, allowing the analyst to fine-tune exp...

متن کامل

Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining

Journal: :International journal of information retrieval research 2021

The centroid-based clustering algorithm depends on the number of clusters, initial centroid, distance measures, and statistical approach central tendencies. centroid initialization defines convergence speed, computing efficiency, execution time, scalability, memory utilization, performance issues for big data clustering. Nowadays various researchers have proposed cluster techniques, where some ...

متن کامل

An Accelerated Nearest Neighbor Search Method for the K-Means Clustering Algorithm

2013

Adam Fausett M. Emre Celebi

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, the nearest neighbor search step of this algorithm can be computationally expensive, as the distance between each input vector and all cluster centers need to be calculated. To accelerate this step, a computationally inexpensive distance estimation method can be tried first, resulting in the rejection o...

متن کامل

An Efficient Hybrid Evolutionary Algorithm for Cluster Analysis

2013

Taher Niknam Bahman Bahmani Majid Nayeripour

Clustering problems appear in a wide range of unsupervised classification applications such as pattern recognition, vector quantization, data mining and knowledge discovery. The k-means algorithm is one of the most widely used clustering techniques. Unfortunately, k-means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is qui...

متن کامل

A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

Journal: :J. Global Optimization 2003

Yunjae Jung Haesun Park Ding-Zhu Du Barry L. Drake

Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of diierent groups. Though many eeective and eecient clustering algorithms have been developed and deployed, most of them still suuer from the lack of automatic or online decision for optimal number of clusters. In this paper, we deene clu...

متن کامل

Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering

2009

Michael Hahsler Kurt Hornik

For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with ...

متن کامل

Assessment of Stability in Partitional Clustering Using Resampling Techniques

2017

Hans-Joachim Mucha

The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to inve...

متن کامل

Differential Evolution with Competing Strategies Applied to Partitional Clustering

2012

Josef Tvrdík Ivan Krivý

We consider the problem of optimal partitional clustering of real data sets by optimizing three basic criteria (trace of within scatter matrix, variance ratio criterion, and Marriottt’s criterion). Four variants of the algorithm based on differential evolution with competing strategies are compared on eight real-world data sets. The experimental results showed that hybrid variants with k-means ...

متن کامل

Robust partitional clustering by outlier and density insensitive seeding

Journal: :Pattern Recognition Letters 2009

Mohammad Al Hasan Vineet Chaoji Saeed Salem Mohammed J. Zaki

The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed se...

متن کامل