k means clustering algorithm

A hybrid clustering algorithm for data mining

Journal: :CoRR 2012

Ravindra Jain

Data clustering is a process of arranging similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is better than among groups. In this paper a hybrid clustering algorithm based on K-mean and K-harmonic mean (KHM) is described. The proposed algorithm is tested on five different datasets. The research is focused on fast an...

متن کامل

Validity Index for Fuzzy K-Means Clustering Using the Gap Statistic Method

2005

Chinatsu Arima Kazumi Hakamada Masahiro Okamoto Taizo Hanai

متن کامل

Разработка ансамбля алгоритмов кластеризации на основе изменяющихся метрик расстояний (Development of the Clustering Algorithms Ensemble Based on Varying Distances Metrics)

2016

Pyotr Bochkaryov Vasiliy Kireev

В настоящее время происходит активное накопление данных большого объёма в различных информационных средах, таких как социальные, корпоративные, научные и другие. Интенсивное использование больших данных в различных областях стимулирует повышенный интерес исследователей к развитию методов и средств обработки и анализа массивных данных огромных объёмов и значительного многообразия. Одним из персп...

متن کامل

Retrospective Analysis of Software Projects using k-Means Clustering

Journal: :Softwaretechnik-Trends 2010

Steffen Herbold Jens Grabowski Helmut Neukirchen Stephan Waack

Software projects are usually analyzed by experts based on their previous experience, their intuition and data they gather about the project. In this work, we show an approach for a purely data-driven retrospective project analysis. We plan to build on this work to make predictions about the evolution of software projects.

متن کامل

Normalization based K means Clustering Algorithm

Journal: :CoRR 2015

Deepali Virmani Taneja Shweta Geetika Malhotra

K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means clustering algorithm applies normalization prior to clustering on the available data as well as the proposed approach calculates initial centroids based on weights...

متن کامل

PROBI: A Heuristic for the probabilistic k-median problem

Journal: :CoRR 2013

Hendrik Fichtenberger Melanie Schmidt

We develop the heuristic PROBI for the probabilistic Euclidean k-median problem based on a coreset construction by Lammersen et al. [28]. Our algorithm computes a summary of the data and then uses an adapted version of k-means++ [5] to compute a good solution on the summary. The summary is maintained in a data stream, so PROBI can be used in a data stream setting on very large data sets. We exp...

متن کامل

Efficient Hierarchical Clustering of Large Data Sets Using P-trees

2002

Anne M. Denton Qiang Ding William Perrizo Qin Ding

Hierarchical clustering methods have attracted much attention by giving the user a maximum amount of flexibility. Rather than requiring parameter choices to be predetermined, the result represents all possible levels of granularity. In this paper a hierarchical method is introduced that is fundamentally related to partitioning methods, such as k-medoids and k-means as well as to a density based...

متن کامل

Metin Madenciliği Kullanılarak Yazılım Kullanımına Dair Bulguların Elde Edilmesi

2015

Deniz Kilinç Fatma Bozyigit Akin Özçift Fatih Yücalar Emin Borandag

Özet. Yazılım teknolojileri hızla ilerlemekte ve buna paralel olarak hem kamu alanında hem de özel sektörde gerçekleştirilen yazılım projelerinin sayısı artmaktadır. Yazılım otomasyon projelerinden elde edilen en büyük çıktılardan birisi kuşkusuz ki üretilen verilerdir. Yüksek boyutlu, anlaşılması güç bu verilerin işlenerek, daha anlamlı ve yönlendirici verilere dönüştürülmesi önemli bir ihtiya...

متن کامل

Accuracy Assessment of Cloud Reconstruction Approaches using Segmentation

2013

E. Menaka Suresh Kumar

Cloud is the major obstacle to analyze data in the satellite images. The various approaches are used to remove the cloud from the satellite image for further processing. The approaches are in-painting and multi-temporal. But, the algorithm working for these approaches cannot produce the accurate results. So, that the accuracy assessment helps to motivate the increased accuracy result. The main ...

متن کامل

Distributed PCA and k-Means Clustering

2013

Yingyu Liang Maria-Florina Balcan Vandana Kanchanapally

This paper proposes a distributed PCA algorithm, with the theoretical guarantee that any good approximation solution on the projected data for k-means clustering is also a good approximation on the original data, while the projected dimension required is independent of the original dimension. When combined with the distributed coreset-based clustering approach in [3], this leads to an algorithm...

متن کامل