clustering error

A Novel MST based Multi-Prototype Clustering Algorithm

2014

Shashwati Banerjea

Clustering is the process of partitioning the data objects such that the degree of similarity among the objects in the same partition is more than the degree of dissimilarity between the objects of different classes. There are many applications of clustering like medicine, marketing, insurance, image analysis etc. The squared-error clustering algorithm uses single prototype to represent a clust...

متن کامل

Finding Low Error Clusterings

2009

Maria-Florina Balcan Mark Braverman

A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., k-means or min-sum) defined in terms of some given pairwise distance or similarity information. However, in many learning motivated clustering applications (such as clustering proteins by function) there is some unknown target clustering; in such cases the pairw...

متن کامل

Fast Spectral Clustering via the Nyström Method

2013

Anna Choromanska Tony Jebara Hyungtae Kim Mahesh Mohan Claire Monteleoni

We propose and analyze a fast spectral clustering algorithm with computational complexity linear in the number of data points that is directly applicable to large-scale datasets. The algorithm combines two powerful techniques in machine learning: spectral clustering algorithms and Nyström methods commonly used to obtain good quality low rank approximations of large matrices. The proposed algori...

متن کامل

Similarity Measures and Clustering of String Patterns

2004

Ana Fred

Clustering is a powerful tool in revealing the intrinsic organization of data. A clustering of structural patterns consists of an unsupervised association of data based on the similarity of their structures and primitives. This chapter addresses the problem of structural clustering, and presents an overview of similarity measures used in this context. The distinction between string matching and...

متن کامل

Noise Clustering Approach to Speaker Verification

2001

Dat Tran Michael Wagner

In a speaker veri ̄cation system, a claimed speaker's score is computed to accept or reject the speaker claim. Most of the current methods compute the score as the ratio of the claimed speaker's and the impostors' likelihood functions. Based on analysing false acceptance error obtained by using these methods, we propose a noise clustering approach to ̄nd better scores which can reduce that error....

متن کامل

A study of unsupervised clustering techniques for language modeling

2008

Sangyun Hahn Abhinav Sethy Hong-Kwang Jeff Kuo Bhuvana Ramabhadran

There has been recent interest in clustering text data to build topic-specific language models for large vocabulary speech recognition. In this paper, we studied various unsupervised clustering algorithms on several corpora. First we compared the clustering methods with quality metrics such as entropy and purity. Of the techniques studied, two-phase bisecting K-means achieved good performance w...

متن کامل

A Novel Approach Towards K-Mean Clustering Algorithm With PSO

2014

Gursharan Saini Harpreet Kaur

In this paper, the proposed approach is an unique combination of two most popular clustering algorithms Particle Swarm Optimization (PSO) and K-Means to achieve better clustering result. Clustering is a technique of grouping homogeneous objects of a dataset with aim to extract some meaningful pattern or information. K-Means algorithm is the most popular clustering algorithm because of its easy ...

متن کامل

Bayesian Hierarchical Cross-Clustering

2011

Dazhuo Li Patrick Shafto

Most clustering algorithms assume that all dimensions of the data can be described by a single structure. Cross-clustering (or multiview clustering) allows multiple structures, each applying to a subset of the dimensions. We present a novel approach to crossclustering, based on approximating the solution to a Cross Dirichlet Process mixture (CDPM) model [Shafto et al., 2006, Mansinghka et al., ...

متن کامل

ISODATA clustering with parameter (threshold for merge and split) estimation based on GA: Genetic Algorithm

2007

Kohei Arai

A method of GA: Genetic Algorithm based ISODATA clustering is proposed. GA clustering is now widely available. One of the problems for GA clustering is a poor clustering performance due to the assumption that clusters are represented as convex functions. Well known ISODATA clustering has parameters of threshold for merge and split. The parameters have to be determined without any assumption (co...

متن کامل

ahp algorithm and un-supervised clustering in auto insurance fraud detection

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه علامه طباطبایی - دانشکده اقتصاد 1389

محسن سراج زاده, حسن رشیدی,

this thesis is a study on insurance fraud in iran automobile insurance industry and explores the usage of expert linkage between un-supervised clustering and analytical hierarchy process(ahp), and renders the findings from applying these algorithms for automobile insurance claim fraud detection. the expert linkage determination objective function plan provides us with a way to determine whi...

15 صفحه اول