Constant factor approximation algorithm for the knapsack median problem
نویسنده
چکیده
We give a constant factor approximation algorithm for the following generalization of the k-median problem. We are given a set of clients and facilities in a metric space. Each facility has a facility opening cost, and we are also given a budget B. The objective is to open a subset of facilities of total cost at most B, and minimize the total connection cost of the clients. This settles an open problem of Krishnaswamy-Kumar-Nagarajan-Sabharwal-Saha. The natural linear programming relaxation for this problem has unbounded integrality gap. Our algorithm strengthens this relaxation by adding constraints which stipulate which facilities a client can get assigned to. We show that after suitably modifying a fractional solution, one can get rich structural properties which allow us to get the desired approximation ratio. 1 Problem Definition The problem of locating facilities to service a set of demands has been widely studied in computer science and operations research communities [LMW98, MF90]. The trade-off involved in such problems is the following – we would like to open as few facilities as possible, but the clients should not be located too far from the nearest facility. The k-median problem balances the two costs as follows : we are given a set D of clients and a set F of potential facilities lying in a metric space. The goal is to open at most k facilities in F so that the average distance traveled by a client in D to the nearest open facility is minimized. The k-median problem is one of the most wellstudied facility location problems with several constant factor approximation algorithms [AGK01, CG99, CGTS02, JV01]. Motivated by applications in content distribution networks, Hajiaghayi et al. [HKK10] considered the following generalization of the k-median problem, which they called the Red-Blue Median Problem – the set of facilities are partitioned into two sets – F1 and F2, and we are given two parameters k1 and k2. The goal is to open at most k1 facilities of F1 and Dept. of Computer Science and Engg., IIT Delhi, India110016, email : [email protected] k2 facilities of F2 such that the total connection cost of the clients is minimized. They gave a constant factor approximation algorithm for this problem. Krishnaswamy et al. [KKN11] generalized this result to the case of arbitrary number of partitions of F . In fact, their result holds even when the set of open facilities is required to be an independent set in a matroid (the matroid median problem). They show that the natural linear programming relaxation for this problem has constant integrality gap. In this paper, we consider the following problem. As in the k-median problem, we are given a set of clients D and facilities F in a metric space. Each client j has an associated demand dj , each facility i has a facility opening cost fi and we are given a budget B. The goal is to open a set of facilities such that their total opening cost is at most B, and minimize the total connection cost of the clients, i.e., ∑ j∈D djc(i(j), j), where i(j) is the facility to which j gets assigned, and c denotes the distance in the underlying metric space. We call this the Knapsack Median Problem. Clearly, the kmedian problem is a special case of this problem where all facilities costs are one, and B = k. In this paper, we give a constant factor approximation algorithm for the Knapsack Median Problem. This answers an open question posed by [KKN11]. The main difficulty here lies in the fact the natural LP relaxation has unbounded integrality gap. This happens even when all facility costs are at most B (the natural LP relaxation for the knapsack problem also has unbounded integrality gap, but it becomes a constant if we remove all items of size more than the knapsack capacity). Consider the LP relaxation given in Section 3 where x(i, j) is 1 if client j is assigned to facility i, and yi is 1 if facility i gets opened. The following integrality gap example was given by Charikar and Guha [CG05] : there are two facilities of cost 1 and B respectively, and two clients (with unit demand) co-located with the two facilities respectively. The distance between the two facilities is a large number D. Clearly, any integral solution can open only one facility, and so must pay D, whereas the optimal fractional solution can open the expensive facility to an extent of 1− 1 B , and so the total cost will be DB . Krishnaswamy et al. [KKN 11] showed that the integrality gap remains unbounded even if we strengthen the LP relaxation by adding knapsack-cover inequalities. One idea of getting around this problem would be to augment the LP relaxation with more information. Suppose we guess the maximum distance between a client and the facility to which it gets assigned in an optimal solution – call this value L. In the LP relaxation, we can set x(i, j) to 0 if c(i, j) > L. This would take care of the above integrality gap example – if we set L to be a value less than D, the LP becomes infeasible, and if L > D, we already have D as a lower bound because we have guessed that at least one demand has connection cost at least D in the optimal solution. But now, consider the same example as above where we have D clients located at each of the two facilities respectively. Now, any integral solution will have cost at least D, and even if we plug in L > D, the LP can get away with value D 2 B only. The lower bound of D is also not enough. Therefore, we need a more subtle way of coming up with a lower bound which looks at groups of clients rather than a single client. We show that, based on a guess of the value of the optimal solution, one can come up with lower bounds Uj for each client j, and set x(i, j) to 0 in the LP relaxation if d(i, j) > Uj. Further these lower bounds are better than what one can obtain by just looking at client j alone. Our rounding algorithm, which closely follows that of Krishnaswamy et al. [KKN11], shows that the natural LP relaxation (where we use the bounds Uj as mentioned) has constant integrality gap except for one group of demands. Our algorithm assigns this group of demands to a single open facility and the connection cost can be bounded by the value of the optimal solution (if our guess for this value is correct). Note that the actual constant in the approximation ratio turns out to be large, and we have not made an attempt to get the optimal value of this constant by balancing various parameters. 1.1 Related Work The k-median problem has been extensively studied in the past and several constant factor approximation algorithms are known for this problem. Lin and Vitter [LV92] gave a constant factor approximation algorithm for this problem while opening at most k(1 + ε) facilities for an arbitrarily small positive constant ε, even when distances do not obey triangle inequality. Assuming that distances obey triangle inequality, the first constant factor approximation algorithm was given by Charikar et al. [CGTS02]. Jain and Vazirani [JV01] gave a primal-dual constant factor approximation algorithm for this problem. Their algorithm first gives a primal-dual algorithm for the facility location problem which has the Lagrange multiplier preserving property (see e.g. [Mes07]). However, their algorithm does not extend to our problem. Indeed, if we use their approach, then we would get two solutions – one of these would open facilities which cost less than the budget B and the other one would spend more than B. Since facilities have non-uniform costs, the idea of combining these two solutions using a randomized algorithm does not seem to work here. There are several approximation algorithms based on local search techniques as well [KPR98, AGK01]. Hajiaghayi et al. [HKK10] used this approach to get a constant factor approximation algorithm for the case of red-blue median problem – recall that here there are two kinds of facilities (red and blue), and for each kind, we have a bound on the number of facilities that can be opened. Each operation in these local search algorithms swaps only one facility at a time. Since facilities have costs, we may need to open and close multiple facilities in each operation. It remains a challenge to analyze such a local search algorithm. Krishnaswamy et al. [KKN11] gave a constant factor approximation algorithm for the matroid median problem. Here, the set of open facilities should form an independent set in a given matroid. A natural special case (and in fact, this captures many of the ideas in the algorithm) is when the set of facilities is partitioned into K groups, and we are given an upper bound on the number of open facilities of each group. They show that the natural LP relaxation has constant integrality gap. Their algorithm begins by using ideas inherent in the algorithm of Charikar et al. [CGTS02], but has more subtle details. In fact, they also give a constant factor approximation for the Knapsack Median Problem, but exceed the budget B by the maximum cost of any facility. Our rounding algorithm also proceeds along the same lines as the latter algorithm, but the presence of the non-uniform bounds Uj allow us to avoid exceeding the budget B. There are several bi-criteria approximation algorithms for the Knapsack Median Problem which violate the budget by (1 + ε)-factor for any ε > 0, and come within a constant of the total connection cost [LV92, CG05]. As mentioned above, Krishnaswamy et al. [KKN11] also gave a constant factor approximation algorithm for this problem while violating the budget by at most the maximum cost of a facility. 1.2 Our Techniques Consider the natural linear programming relaxation given in Section 3. As explained in the previous section, the integrality gap of this relaxation is unbounded. Now, suppose we know (up to a constant factor) the value of the optimal solution – call this OPT (we can do this by binary search). Based on this guess, we can come up with a bound Uj for each client j as follows. Suppose j is a assigned to a facility i where c(i, j) is at least a parameter Uj . Then any other client j ′ must be assigned to a facility i satisfying c(i, j) ≥ Uj − c(j, j ) distance away from it (otherwise we can improve the connection cost of j). Hence, we can deduce that
منابع مشابه
Constant factor Approximation Algorithms for Uniform Hard Capacitated Facility Location Problems: Natural LP is not too bad
Abstract. In this paper, we study the uniform hard capacitated k facility location problem (CkFLP) and knapsack median problem (CKM). Natural LP of both the problems have an unbounded integrality gap. Byrka et al. in [5] present an (O(1/ǫ)) for CkFLP violating cpapcities by a factor of (2 + ǫ). However, the proofs in [5] do not seem to work. In this paper, we first raise the issues in [5] and t...
متن کاملConstant Approximation for k-Median and k-Means with Outliers via Iterative Rounding
In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1 + ≤ 7.081 + )-approximation algorithm for k-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen [16]. For k-means with outliers, we give an (α2 + ≤ 53.002 + )-approximation, which is the first O(1)-approximation for this problem...
متن کاملApproximation Algorithms for the Incremental Knapsack Problem via Disjunctive Programming
In the incremental knapsack problem (IK), we are given a knapsack whose capacity grows weakly as a function of time. There is a time horizon of T periods and the capacity of the knapsack is Bt in period t for t = 1, . . . , T . We are also given a set S of N items to be placed in the knapsack. Item i has a value of vi and a weight of wi that is independent of the time period. At any time period...
متن کاملImprovable Knapsack Problems
We consider a variant of the knapsack problem, where items are available with different possible weights. Using a separate budget for these item improvements, the question is: Which items should be improved to which degree such that the resulting classic knapsack problem yields maximum profit? We present a detailed analysis for several cases of improvable knapsack problems, presenting constant ...
متن کاملMaximizing Nonmonotone Submodular Functions under Matroid or Knapsack Constraints
Submodular function maximization is a central problem in combinatorial optimization, generalizing many important problems including Max Cut in directed/undirected graphs and in hypergraphs, certain constraint satisfaction problems, maximum entropy sampling, and maximum facility location problems. Unlike submodular minimization, submodular maximization is NP-hard. In this paper, we give the firs...
متن کاملCsc5160: Combinatorial Optimization and Approximation Algorithms Topic: Polynomial Time Approximation Scheme 17.1 Polynomial Time Approximation Scheme 17.2 Knapsack Problem
In previous chapters we have seen the definition of a constant factor approximation algorithm. In this chapter, we will introduce the notion of a polynomial time approximation scheme (PTAS), which allows approximability to any required degree. To illustrate how PTAS works, we will study two examples, including the knapsack problem and the bin packing problem. The dynamic programming technique w...
متن کامل