Coded Distributed Computing With Partial Recovery

نویسندگان

چکیده

Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning straggling behavior and ignore computations carried out by workers. Moreover, these are typically designed to recover desired results accurately, while many machine learning iterative optimization algorithms, faster approximate solutions known result an improvement overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called xmlns:xlink="http://www.w3.org/1999/xlink">coded with partial recovery (CCPR) , which benefits from advantages both uncoded schemes, reduces time decoding complexity allowing trade-off between accuracy speed computation. We then extend approach implementation more general tasks proposing communication scheme recovery, where subtasks computed before being communicated. Numerical simulations on large linear regression task confirm proposed terms latency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coded Computing for Distributed Graph Analytics

Many distributed graph computing systems have been developed recently for efficient processing of massive graphs. These systems require many messages to be exchanged among computing machines at each step of the computation, making communication bandwidth a major performance bottleneck. We present a coded computing framework that systematically injects redundancy in the computation phase to enab...

متن کامل

Coded Distributed Computing with Node Cooperation Substantially Increases Speedup Factors

This work explores a distributed computing setting where K nodes are assigned fractions (subtasks) of a computational task in order to perform the computation in parallel. In this setting, a well-known main bottleneck has been the internode communication cost required to parallelize the task, because unlike the computational cost which could keep decreasing as K increases, the communication cos...

متن کامل

Optimal Recovery Schemes in Distributed COMPUTING

Clusters and distributed systems offer fault tolerance and high performance through load sharing, and are thus attractive in real-time applications. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers fail this must be redistributed. The redistribution is determined by the recovery scheme. The recovery scheme sho...

متن کامل

A New Combinatorial Design of Coded Distributed Computing

Coded distributed computing introduced by Li et al. in 2015 is an efficient approach to trade computing power to reduce the communication load in general distributed computing frameworks such as MapReduce. In particular, Li et al. show that increasing the computation load in the Map phase by a factor of r can create coded multicasting opportunities to reduce the communication load in the Reduce...

متن کامل

Coded Caching with Distributed Storage

Content delivery networks store information distributed across multiple servers, so as to balance the load and avoid unrecoverable losses in case of node or disk failures. Coded caching has been shown to be a useful technique which can reduce peak traffic rates by pre-fetching popular content at the end users and encoding transmissions so that different users can extract different information f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Information Theory

سال: 2022

ISSN: ['0018-9448', '1557-9654']

DOI: https://doi.org/10.1109/tit.2021.3133791