On Scheduling Coflows
نویسندگان
چکیده
Applications designed for data-parallel computation frameworks such as MapReduce usually alternate between computation and communication stages. Coflow scheduling is a recent popular networking abstraction introduced to capture such application-level communication patterns in datacenters. In this framework, a datacenter is modeled as a single non-blocking switch with m input ports and m output ports. A coflow j is a collection of flow demands {djio}i∈m,o∈m that is said to be complete once all of its requisite flows have been scheduled. We consider the offline coflow scheduling problem with and without release times to minimize the total weighted completion time. Coflow scheduling generalizes the well studied concurrent open shop scheduling problem and is thus NP-hard. Qiu, Stein and Zhong [14] obtain the first constant approximation algorithms for this problem via LP rounding and give a deterministic 67 3 -approximation and a randomized (9 + 16 √ 2 3 ) ≈ 16.54-approximation algorithm. In this paper, we give a combinatorial algorithm that improves significantly upon theirs to yield a deterministic 5-approximation algorithm with release times. For the case without release time, it is 4-approximation.
منابع مشابه
An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters
In data-parallel computing frameworks, intermediate parallel data is often produced at various stages which needs to be transferred among servers in the datacenter network (e.g. the shuffle phase in MapReduce). A stage often cannot start or be completed unless all the required data pieces from the preceding stage are received. Coflow is a recently proposed networking abstraction to capture such...
متن کاملMulti-hop Coflow Routing and Scheduling in Data Centers
Communication in data centers often involves many parallel flows that all share the same performance goal. A useful abstraction, coflow, is proposed to express the communication requirements of prevalent data parallel paradigms. The multiple coflow routing and scheduling problem faces challenges when deriving a good theoretical performance ratio because coexisting coflows will compete for the s...
متن کاملExperimental Analysis of Algorithms for Coflow Scheduling
Modern data centers face new scheduling challenges in optimizing job-level performance objectives, where a significant challenge is the scheduling of highly parallel data flows with a common performance goal (e.g., the shuffle operations in MapReduce applications). Chowdhury and Stoica [6] introduced the coflow abstraction to capture these parallel communication patterns, and Chowdhury et al. [...
متن کاملEfficient Coflow Scheduling Without Prior Knowledge — Public Review
A lot of blood, sweat and tears have been shed in the quest to improve network performance, mostly in terms of flow completion times. But this race to the top has meant that we have been guilty of forgetting what really matters — application performance. Applications have different notions of the utility they derive from flow completion, so determining the right network metric to optimize is a ...
متن کاملOnline Partial Throughput Maximization for Multidimensional Coflow
Coflow has recently been introduced to capture communication patterns that are widely observed in the cloud and massively parallel computing. Coflow consists of a number of flows that each represents data communication from one machine to another. A coflow is completed when all of its flows are completed. Due to its elegant abstraction of the complicated communication processes found in various...
متن کامل