Accelerated Stochastic Gradient Descent for Minimizing Finite Sums
نویسنده
چکیده
We propose an optimization method for minimizing the finite sums of smooth convex functions. Our method incorporates an accelerated gradient descent (AGD) and a stochastic variance reduction gradient (SVRG) in a mini-batch setting. Unlike SVRG, our method can be directly applied to non-strongly and strongly convex problems. We show that our method achieves a lower overall complexity than the recently proposed methods that supports non-strongly convex problems. Moreover, this method has a fast rate of convergence for strongly convex problems. Our experiments show the effectiveness of our method.
منابع مشابه
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. However, in the context of empirical risk minimization, it is often helpful to augment the training set by considering random perturbations of input examples. In this case, the objective is no longer a finite sum, and the main candidate for optimization is the stochas...
متن کاملA Simple Practical Accelerated Method for Finite Sums
We describe a novel optimization method for finite sums (such as empirical risk minimization problems) building on the recently introduced SAGA method. Our method achieves an accelerated convergence rate on strongly convex smooth problems. Our method has only one parameter (a step size), and is radically simpler than other accelerated methods for finite sums. Additionally it can be applied when...
متن کاملSupplementary Materials: Accelerated Stochastic Gradient Descent for Minimizing Finite Sums
1 Proof of the Proposition 1 We now prove the Proposition 1 that gives the condition of compactness of sublevel set. Proof. Let B(r) and S(r) denote the ball and sphere of radius r, centered at the origin. By affine transformation, we can assume that X∗ contains the origin O, X∗ ⊂ B(1), and X∗ ∩ S(1) = φ. Then, we have that for ∀x ∈ S(1), (∇f(x), x) ≥ f(x)− f(O) > 0, where we use convexity for ...
متن کاملStochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure
In this work we consider the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We propose a novel algorithm called Accelerated Nonsmooth Stochastic Gradient Descent (ANSGD), which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algori...
متن کاملConditional Accelerated Lazy Stochastic Gradient Descent
In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate O( 1 ε2 ) improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate O( 1 ε4 ).
متن کامل