Affinity Propagation: Clustering Data by Passing Messages

نویسنده

  • Delbert Dueck
چکیده

AFFINITY PROPAGATION: CLUSTERING DATA BY PASSING MESSAGES Delbert Dueck Doctor of Philosophy Graduate Department of Electrical & Computer Engineering University of Toronto 2009 Clustering data by identifying a subset of representative examples is important for detecting patterns in data and in processing sensory signals. Such “exemplars” can be found by randomly choosing an initial subset of data points as exemplars and then iteratively refining it, but this works well only if that initial choice is close to a good solution. This thesis describes a method called “affinity propagation” that simultaneously considers all data points as potential exemplars, exchanging real-valued messages between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. Affinity propagation takes as input a set of pairwise similarities between data points and finds clusters on the basis of maximizing the total similarity between data points and their exemplars. Similarity can be simply defined as negative squared Euclidean distance for compatibility with other algorithms, or it can incorporate richer domain-specific models (e.g., translation-invariant distances for comparing images). Affinity propagation’s computational and memory requirements scale linearly with the number of similarities input; for non-sparse problems where all possible similarities are computed, these requirements scale quadratically with the number of data points. Affinity propagation is demonstrated on several applications from areas such as computer vision and bioinformatics, and it typically finds better clustering solutions than other methods in less time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Replication on Affinity Propagation: Clustering by Passing Messages Between Data Points

In this project, I choose the paper, Clustering by Passing Messages Between Data Points [1], published in Science 2007, as the work to replicate. In [1], a new clustering algorithm named Affinity Propagation is proposed. In my report, three research hypothesizes in this paper that are related to the performance, evaluation and scalability of the algorithm are studied. I implement the algorithm ...

متن کامل

Local and global approaches of affinity propagation clustering for large scale data

Recently a new clustering algorithm called ‘affinity propagation’ (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approac...

متن کامل

Evolutionary Clustering via Message Passing

When data are acquired at multiple points in time, evolutionary clustering can provide insights into cluster evolution and changes in cluster memberships while enabling performance superior to that obtained by independently clustering data collected at different time points. Existing evolutionary clustering methods typically require additional steps before and after the clustering stage to appr...

متن کامل

Clustering by passing messages between data points.

Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such "exemplars" can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called "affinity propagation," which ta...

متن کامل

Comment on "Clustering by passing messages between data points".

Frey and Dueck (Reports, 16 February 2007, p. 972) described an algorithm termed "affinity propagation" (AP) as a promising alternative to traditional data clustering procedures. We demonstrate that a well-established heuristic for the p-median problem often obtains clustering solutions with lower error than AP and produces these solutions in comparable computation time.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008