Fast Multiplier Methods to Optimize Non-exhaustive, Overlapping Clustering
نویسندگان
چکیده
Clustering is one of the most fundamental and important tasks in data mining. Traditional clustering algorithms, such as K-means, assign every data point to exactly one cluster. However, in real-world datasets, the clusters may overlap with each other. Furthermore, often, there are outliers that should not belong to any cluster. We recently proposed the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective as a way to address both issues in an integrated fashion. Optimizing this discrete objective is NPhard, and even though there is a convex relaxation of the objective, straightforward convex optimization approaches are too expensive for large datasets. A practical alternative is to use a low-rank factorization of the solution matrix in the convex formulation. The resulting optimization problem is non-convex, and we can locally optimize the objective function using an augmented Lagrangian method. In this paper, we consider two fast multiplier methods to accelerate the convergence of an augmented Lagrangian scheme: a proximal method of multipliers and an alternating direction method of multipliers (ADMM). For the proximal augmented Lagrangian or proximal method of multipliers, we show a convergence result for the non-convex case with bound-constrained subproblems. These methods are up to 13 times faster—with no change in quality—compared with a standard augmented Lagrangian method on problems with over 10,000 variables and bring runtimes down from over an hour to around 5 minutes.
منابع مشابه
Non-exhaustive, Overlapping k-means
Traditional clustering algorithms, such as k-means, output a clustering that is disjoint and exhaustive, that is, every single data point is assigned to exactly one cluster. However, in real datasets, clusters can overlap and there are often outliers that do not belong to any cluster. This is a well recognized problem that has received much attention in the past, and several algorithms, such as...
متن کاملClustering with evolution strategies
-Tbe applicability of evolution strategies (ESs), population based stochastic optimization techniques, to optimize clustering objective functions is explored. Clustering objective functions are categorized into centroid and non-centroid type of functions. Optimization of the centroid type of objective functions is accomplished by formulating them as functions of real-valued parameters using ESs...
متن کاملComparison of Non-Overlapping Domain Decomposition Methods for the Parallel Solution of Magnetic Field Problems
The aim of this paper is to give a unified comparison of non-overlapping domain decomposition methods (DDMs) for solving magnetic field problems. The methods under investigation are the Schur complement method and the Lagrange multiplier based Finite Element Tearing and Interconnecting (FETI) method, and their solvers. The performance of these methods has been investigated in detail for two-dim...
متن کاملDetecting Overlapping Communities in Social Networks using Deep Learning
In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...
متن کاملA fast wallace-based parallel multiplier in quantum-dot cellular automata
Physical limitations of Complementary Metal-Oxide-Semiconductors (CMOS) technology at nanoscale and high cost of lithography have provided the platform for creating Quantum-dot Cellular Automata (QCA)-based hardware. The QCA is a new technology that promises smaller, cheaper and faster electronic circuits, and has been regarded as an effective solution for scalability problems in CMOS technolog...
متن کامل