means cluster

K+ Means : An Enhancement Over K-Means Clustering Algorithm

Journal: :CoRR 2017

Srikanta Kolay Kumar Sankar Ray Abhoy Chand Mondal

K-means (MacQueen, 1967) [1] is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set to a predefined, say K number of clusters. Determination of K is a difficult job and it is not known that which value of K can partition the objects as per our intuition. To overcome this probl...

متن کامل

Effects of Starting Quarter Score, Game Location, and Quality of Opposition in Quarter Score in Elite Women’s Basketball

2013

Ernesto Moreno Miguel A. Gómez Carlos Lago Jaime Sampaio

The aim of the present study was to identify the effects of a starting score-line on the game quarter final score (final quarter outcome except for the first period) when considering the quality of the opposition and game location. The sample comprised 1,456 game quarters from the Spanish women’s professional league (seasons 2009/2010 and 2010/2011). A k-means cluster analysis classified the ga...

متن کامل

Optimization heuristics for determining internal rating grading scales

Journal: :Computational Statistics & Data Analysis 2010

Marianna Lyra J. Paha Sandra Paterlini Peter Winker

Basel II imposes regulatory capital on banks related to the default risk of their credit portfolio. Banks using an internal rating approach compute the regulatory capital from pooled probabilities of default. These pooled probabilities can be calculated by clustering credit borrowers into different buckets and computing the mean PD for each bucket. The clustering problem can become very complex...

متن کامل

A method for initialising the K-means clustering algorithm using kd-trees

Journal: :Pattern Recognition Letters 2007

Stephen James Redmond Conor Heneghan

We present a method for initialising the K-means clustering algorithm. Our method hinges on the use of a kd-tree to perform a density estimation of the data at various locations. We then use a modi cation of Katsavounidis' algorithm, which incorporates this density information, to choose K seeds for the K-means algorithm. We test our algorithm on 36 synthetic data sets and compare with 25 runs ...

متن کامل

Sentence Clustering Using Continuous Vector Space Representation

2015

Mara Chinea-Rios Germán Sanchis-Trilles Francisco Casacuberta

In this paper, we present a clustering approach based on the combined use of a continuous vector space representation of sentences and the k-means algorithm. The principal motivation of this proposal is to split a big heterogeneous corpus into clusters of similar sentences. We use the word2vec toolkit for obtaining the representation of a given word as a continuous vector space. We provide empi...

متن کامل

Applying Kohonen Vector Quantization Networks for Profiling Customers of Mobile Telecommunication Services

2006

Indranil Bose Chen Xi

Customer clustering is used to understand customers’ preferences and behaviors by examining the differences and similarities between customers. Kohonen vector quantization clustering technology is used in this research and is compared with Kmeans clustering. The data set consists of customer records obtained from a mobile telecommunications service provider. The customers are clustered using va...

متن کامل

Clustering Using a Random Walk Based Distance Measure

2005

Luh Yen Denis Vanvyve Fabien Wouters François Fouss Michel Verleysen Marco Saerens

This work proposes a simple way to improve a clustering algorithm. The idea is to exploit a new distance metric called the “Euclidian Commute Time” (ECT) distance, based on a random walk model on a graph derived from the data. Using this distance measure instead of the usual Euclidean distance in a k-means algorithm allows to retrieve wellseparated clusters of arbitrary shape, without working h...

متن کامل

Investigating attributes affecting the performance of WBI users

Journal: :Computers & Education 2013

Rana A. Alhajri Steve Counsell Xiaohui Liu

Numerous research studies have explored the effect of hypermedia on learners’ performance using Web Based Instruction (WBI). A learner’s performance is determined by their varying skills and abilities as well as various differences such as gender, cognitive style and prior knowledge. In this paper, we investigate how differences between individuals influenced learner’s performance using a hyper...

متن کامل

Classified information: the data clustering problem

Journal: :Computing in Science and Engineering 2003

Nargess Memarsadeghi Dianne P. O'Leary Yalin Evren Sagduyu

a combination of efficiency, emissions, noise levels, and other criteria. Researchers routinely classify documents as “relevant to the current project” or “irrelevant.” Genome decoding divides chromosomes into genes, regulatory regions, signals, and so on. Pathologists identify cells as cancerous or benign. We can classify data into different groups by clustering data that are close with respec...

متن کامل

Dominant Sets in Microarray Data

2005

Xuping Fu Li Teng Yao Li Wenbin Chen Yumin Mao I-Fan Shen Yi Xie

Clustering allows us to extract groups of genes that are tightly coexpressed from Microarray data. In this paper, a new method DSF_Clust is developed to find dominant sets (clusters). We have preformed DSF_Clust on several gene expression datasets and given the evaluation with some criteria. The results showed that this approach could cluster dominant sets of good quality compared to kmeans met...

متن کامل