Clustering by Maximizing Sum-of-Squared Separation Distance

نویسندگان

Yixin Chen

Jinbo Bi

چکیده

Maximizing the separating margin is crucial for the good generalization performance of Support Vector Machines (SVMs). Analogous to the definition of separation distance or separating margin in SVMs, we propose a definition on separation distance in clustering tasks when a hyperplane is used to separate clusters. For given training data and a given metric distance, by maximizing the proposed separation distance, our clustering algorithm constructs an “optimal” hyperplane that can be applied to unseen data in the future. The resulting hyperplane corresponds to a nonlinear decision boundary in the input feature space through an appropriate distance feature mapping. A graph-theoretic perspective of the proposed method is discussed. In particular, we show that, under certain conditions, the proposed clustering algorithm is equivalent to a spectral relaxed graph cut. Extensive experimental results are provided to validate the method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on exact methods for minimum sum-of-squares clustering

Minimum sum-of-squares clustering (MSSC) consists in partitioning a given set of n entities into k clusters in order to minimize the sum of squared distances from the entities to the centroid of their cluster. Among many criteria used for cluster analysis, the minimum sum-of-squares is one of the most popular since it expresses both homogeneity and separation. A mathematical programming formula...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

An Efficient Unified K-Means Clustering Technique for Microarray Gene Expression Data

Problem statement: Using microarray techniques one could monitor the expressions levels of thousands of genes simultaneously. One challenge was how to derive meaningful insights into expressed data. This might be carried out by clustering techniques such as hierarchical and k-means, but most of the clustering techniques were largely heuristic in nature and are associated with some unresolved is...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

T-test distance and clustering criterion for speaker diarization

In this paper, we present an application of student’s t-test to measure the similarity between two speaker models. The measure is evaluated by comparing with other distance metrics: the Generalized Likelihood Ratio, the Cross Likelihood Ratio and the Normalized Cross Likelihood Ratio in speaker detection task. We also propose an objective criterion for speaker clustering. The criterion deduces ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Clustering by Maximizing Sum-of-Squared Separation Distance

نویسندگان

چکیده

منابع مشابه

A survey on exact methods for minimum sum-of-squares clustering

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

An Efficient Unified K-Means Clustering Technique for Microarray Gene Expression Data

Repeated Record Ordering for Constrained Size Clustering

T-test distance and clustering criterion for speaker diarization

عنوان ژورنال:

اشتراک گذاری