Formalizing Hierarchical Clustering as Integer Linear Programming
نویسندگان
چکیده
Hierarchical clustering is typically implemented as a greedy heuristic algorithm with no explicit objective function. In this work we formalize hierarchical clustering as an integer linear programming (ILP) problem with a natural objective function and the dendrogram properties enforced as linear constraints. Though exact solvers exists for ILP we show that a simple randomized algorithm and a linear programming (LP) relaxation can be used to provide approximate solutions faster. Formalizing hierarchical clustering also has the benefit that relaxing the constraints can produce novel problem variations such as overlapping clusterings. Our experiments show that our formulation is capable of outperforming standard agglomerative clustering algorithms in a variety of settings, including traditional hierarchical clustering as well as learning overlapping clusterings.
منابع مشابه
A flexible ILP formulation for hierarchical clustering
Hierarchical clustering is a popular approach in a number of fields with many well known algorithms. However, all existing work to our knowledge implements a greedy heuristic algorithm with no explicit objective function. In this work we formalize hierarchical clustering as an integer linear programming (ILP) problem with a natural objective function and the dendrogram properties enforced as li...
متن کاملA global optimization framework for speaker diarization
In this paper, we propose a new clustering model for speaker diarization. A major problem with using greedy agglomerative hierarchical clustering for speaker diarization is that they do not guarantee an optimal solution. We propose a new clustering model, by redefining clustering as a problem of Integer Linear Programming (ILP). Thus an ILP solver can be used which searches the solution of spea...
متن کاملInteger linear programming for speaker diarization and cross-modal identification in TV broadcast
Most state-of-the-art approaches address speaker diarization as a hierarchical agglomerative clustering problem in the audio domain. In this paper, we propose to revisit one of them: speech turns clustering based on the Bayesian Information Criterion (a.k.a. BIC clustering). First, we show how to model it as an integer linear programming (ILP) problem. Its resolution leads to the same overall d...
متن کاملBayesian Model Based Clustering Procedures
This paper establishes a general framework for Bayesian model-based clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. It is rich enough to encompass a variety of existing procedures, including some recently discussed methodologies involving stochastic search or hierarchical clustering, but more importantly allows the formulat...
متن کاملHierarchical Clustering via Spreading Metrics
We study the cost function for hierarchical clusterings introduced by [Dasgupta, 2016] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [Dasgupta, 2016] that a top-down algorithm returns a hierarchical clustering of cost at most O (αn log n) times the cost of the optimal hierarchical clustering, where ...
متن کامل