Stochastic Dual Coordinate Descent with Alternating Direction Multiplier Method
نویسنده
چکیده
A. Derivation of the proximal operation for the smoothed hinge loss By the definition of the smoothed hinge loss, we have that, for −1 ≤ y i v ≤ 0, f * i (v) = sup u∈R {uv − f i (u)} = sup u∈R uv − 1 2 (1 − y i u)
منابع مشابه
Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers
We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems. Our method is based on Alternating Direction Method of Multipliers (ADMM) to deal with complex regularization functions such as structured regularizations. Our method can naturally afford mini-batch update and it gives speed up of convergence. We show that, under mi...
متن کاملTrading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...
متن کاملDual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method
We develop new stochastic optimization methods that are applicable to a wide range of structured regularizations. Basically our methods are combinations of basic stochastic optimization techniques and Alternating Direction Multiplier Method (ADMM). ADMM is a general framework for optimizing a composite function, and has a wide range of applications. We propose two types of online variants of AD...
متن کاملEfficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers
Linear classification has demonstrated success in many areas of applications. Modern algorithms for linear classification can train reasonably good models while going through the data in only tens of rounds. However, large data often does not fit in the memory of a single machine, which makes the bottleneck in large-scale learning the disk I/O, not the CPU. Following this observation, Yu et al....
متن کاملCis400/401 Final Project Report
The rise of ‘big data’ and large-scale machine learning has created an increasing need for distributed optimization.dual descent and the alternating Most of the current literature has focused on coordinate descent, a prominent distributed optimization technique, due to its simplicity and effectiveness. We focus on implementing two other optimization techniques distributed dual descent and the a...
متن کامل