On Coreset Constructions for the Fuzzy $K$-Means Problem
نویسندگان
چکیده
In this paper, we present coreset constructions for the fuzzy Kmeans problem. First, we show that one can construct a weak coresets for fuzzy K-means. Second, we show that there are coresets for fuzzy K-means with respect to balanced fuzzy K-means solutions. Third, we use these coresets to develop a randomized approximation algorithm whose runtime is polynomial in the number of the given points and the dimension of these points.
منابع مشابه
Complexity and Approximation of the Fuzzy K-Means Problem
The fuzzy K-means problem is a generalization of the classical K-means problem to soft clusterings, i.e. clusterings where each points belongs to each cluster to some degree. Although popular in practice, prior to this work the fuzzy K-means problem has not been studied from a complexity theoretic or algorithmic perspective. We show that optimal solutions for fuzzy K-means cannot, in general, b...
متن کاملStreamKM++: A Clustering Algorithm for Data Streams∗
We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++ seeding procedure to obtain small core...
متن کاملScalable and Distributed Clustering via Lightweight Coresets
Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of coresets called lightweight cor...
متن کاملCoresets and approximate clustering for Bregman divergences
We study the generalized k-median problem with respect to a Bregman divergence Dφ. Given a finite set P ⊆ R of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) = ∑ p∈P minc∈C { Dφ(p, c) } is minimized. The Bregman k-median problem plays an important role in many applications, e.g. information theory, statistics, text classification, and speech processing. We g...
متن کاملA StreamKM++: A Clustering Algorithm for Data Streams
We develop a new k-means clustering algorithm for data streams of points from a Euclidean space. We call this algorithm StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm of Arthur and Vassilvitskii (SODA '07). To compute the small sample, we propose two new techniques. First, we use an adaptive, non-u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1612.07516 شماره
صفحات -
تاریخ انتشار 2016