خوشهبندی k means

A bad 2-dimensional instance for k-means++

Journal: :CoRR 2013

Ragesh Jaiswal Prachi Jain Saumya Yadav

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from among the given points. For i > 1, pick a point to be the i center with probability proportional to the square of the Euclidean dist...

متن کامل

On Clustering Histograms with k-Means by Using Mixed α-Divergences

Journal: :Entropy 2014

Frank Nielsen Richard Nock Shun-ichi Amari

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retr...

متن کامل

Wasserstein k-means++ for Cloud Regime Histogram Clustering

2017

Matthew Staib Stefanie Jegelka

Much work has sought to discern the different types of cloud regimes, typically via Euclidean k-means clustering of histograms. However, these methods ignore the underlying similarity structure of cloud types. Wasserstein k-means clustering is a promising candidate for utilizing this structure during clustering, but existing algorithms do not scale well and lack the quality guarantees of the Eu...

متن کامل

On the Consistency of k-means++ algorithm

Journal: :CoRR 2017

Mieczyslaw A. Klopotek

We prove in this paper that the expected value of the objective function of the k-means++ algorithm for samples converges to population expected value. As k-means++, for samples, provides with constant factor approximation for k-means objectives, such an approximation can be achieved for the population with increase of the sample size. This result is of potential practical relevance when one is...

متن کامل

k - variates + + : more pluses in the k - means + + — Supplementary Information — Richard

2016

Richard Nock Raphaël Canyasse Roksana Boreli Frank Nielsen

This is the Supplementary Information to Paper ”k-variates++: more pluses in the kmeans++”, appearing in the proceedings of ICML 2016. Notation “main file” indicates reference to the paper.

متن کامل

The k-means-u* algorithm: non-local jumps and greedy retries improve k-means++ clustering

Journal: :CoRR 2017

Bernd Fritzke

We present a new clustering algorithm called k-means-u* which in many cases is able to significantly improve the clusterings found by k-means++, the current de-facto standard for clustering in Euclidean spaces. First we introduce the k-means-u algorithm which starts from a result of k-means++ and attempts to improve it with a sequence of non-local “jumps” alternated by runs of standard k-means....

متن کامل

A Tight Lower Bound Instance for k-means++ in Constant Dimension

2014

Anup Bhattacharya Ragesh Jaiswal Nir Ailon

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. For i > 1, pick a point to be the i center with probability proportional to the square of the Euclidean distance o...

متن کامل

A Novel architecture for improving performance under virtualized environments

Journal: :CoRR 2015

A. P. Nirmala R. Sridaran

Even though virtualization provides a lot of advantages in cloud computing, it does not provide effective performance isolation between the virtualization machines. In other words, the performance may get affected due the interferences caused by co-virtual machines. This can be achieved by the proper management of resource allocations between the Virtual Machines running simultaneously. This pa...

متن کامل

اصلاح خوشه بندی k-means توسط الگوریتم ژنتیک بهبودیافته

پایان نامه :موسسه آموزش عالی غیردولتی و غیرانتفاعی صنعتی فولاد - دانشکده علوم پایه 1392

سمیه رویین تن اردکانی, محمد نادری دهکردی, محمدعلی منتظری,

خوشه بندی تکنیکی از داده¬کاوی است که تعدادی آیتم را می¬گیرد و آنها را براساس ویژگیها¬یشان درون خوشه¬ها قرار می¬دهد. آیتمهای درون هر خوشه بیشترین میزان شباهت را در ویژگی بخصوصی که از پیش مشخص شده است،با هم دارند و آیتمهای خوشه¬های مختلف بیشترین تفاوت را در آن ویژگی، نسبت به هم دارند. خوشه¬بندی انواع مختلفی دارد که k-means یکی از بهترین و ساده¬ترین آنهاست. این خوشه¬بندی به این دلیل که پایه¬ی برخی...

On the k-Means/Median Cost Function

Journal: :CoRR 2017

Anup Bhattacharya Ragesh Jaiswal

In this work, we study the k-means cost function. The (Euclidean) k-means problem can be described as follows: given a dataset X ⊆ R and a positive integer k, find a set of k centers C ⊆ R such that Φ(C,X) def = ∑ x∈X minc∈C ||x− c|| 2 is minimized. Let ∆k(X) def = minC⊆Rd Φ(C,X) denote the cost of the optimal k-means solution. It is simple to observe that for any dataset X, ∆k(X) decreases as ...

متن کامل