Optimality guarantees for distributed statistical estimation
نویسندگان
چکیده
Large data sets often require performing distributed statistical estimation, with a full data set split across multiple machines and limited communication between machines. To study such scenarios, we define and study some refinements of the classical minimax risk that apply to distributed settings, comparing to the performance of estimators with access to the entire data. Lower bounds on these quantities provide a precise characterization of the minimum amount of communication required to achieve the centralized minimax risk. We study two classes of distributed protocols: one in which machines send messages independently over channels without feedback, and a second allowing for interactive communication, in which a central server broadcasts the messages from a given machine to all other machines. We establish lower bounds for a variety of problems, including location estimation in several families and parameter estimation in different types of regression models. Our results include a novel class of quantitative data-processing inequalities used to characterize the effects of limited communication.
منابع مشابه
Multiple Optimality Guarantees in Statistical Learning
Multiple Optimality Guarantees in Statistical Learning by John C Duchi Doctor of Philosophy in Computer Science and the Designated Emphasis in Communication, Computation, and Statistics University of California, Berkeley Professor Michael I. Jordan, Co-chair Professor Martin J. Wainwright, Co-chair Classically, the performance of estimators in statistical learning problems is measured in terms ...
متن کاملOptimal Simple Step-Stress Plan for Type-I Censored Data from Geometric Distribution
Abstract. A simple step-stress accelerated life testing plan is considered when the failure times in each level of stress are geometrically distributed under Type-I censoring. The problem of choosing the optimal plan is investigated using the asymptotic variance-optimality as well as determinant-optimality and probability-optimality criteria. To illustrate the results of the paper, an example i...
متن کاملDistributed Statistical Estimation and Rates of Convergence in Normal Approximation
This paper presents new algorithms for distributed statistical estimation that can take advantage of the divide-and-conquer approach. We show that one of the key benefits attained by an appropriate divide-and-conquer strategy is robustness, an important characteristic of large distributed systems. We introduce a class of algorithms that are based on the properties of the geometric median, estab...
متن کاملDistributed Nonlinear Robust Control for Power Flow in Islanded Microgrids
In this paper, a robust local controller has been designed to balance the power for distributed energy resources (DERs) in an islanded microgrid. Three different DER types are considered in this study; photovoltaic systems, battery energy storage systems, and synchronous generators. Since DER dynamics are nonlinear and uncertain, which may destabilize the power system or decrease the performanc...
متن کاملComputational Limits of A Distributed Algorithm for Smoothing Spline
In this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline setup, we observe a phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sha...
متن کامل