Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
نویسندگان
چکیده
Nonconvex Stochastic Optimization stochastic optimization problems arise in many machine learning problems, including deep learning. The gradient Hamiltonian Monte Carlo (SGHMC) is a variant of gradients with momentum method which controlled and properly scaled Gaussian noise added to the steer iterates toward global minimum. SGHMC has shown empirical success practice for solving nonconvex problems. In “Global convergence optimization: Nonasymptotic performance bounds momentum-based acceleration,” Gao, Gürbüzbalaban, Zhu provide, first time, finite-time context both population risk minimization show that acceleration possible optimization.
منابع مشابه
Stochastic Gradient Hamiltonian Monte Carlo
Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a MetropolisHastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient com...
متن کاملStochastic Gradient Hamiltonian Monte Carlo
Supplementary Material A. Background on Fokker-Planck Equation The Fokker-Planck equation (FPE) associated with a given stochastic differential equation (SDE) describes the time evolution of the distribution on the random variables under the specified stochastic dynamics. For example, consider the SDE: dz = g(z)dt+N (0, 2D(z)dt), (16) where z ∈ R, g(z) ∈ R, D(z) ∈ Rn×n. The distribution of z go...
متن کاملAsynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provi...
متن کاملStochastic Recursive Gradient Algorithm for Nonconvex Optimization
In this paper, we study and analyze the mini-batch version of StochAstic Recursive grAdient algoritHm (SARAH), a method employing the stochastic recursive gradient, for solving empirical loss minimization for the case of nonconvex losses. We provide a sublinear convergence rate (to stationary points) for general nonconvex functions and a linear convergence rate for gradient dominated functions,...
متن کاملConvergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization
A. Proof of Theorem 1 We first recall the following lemma. Lemma 1 (Lemma 1, (Gong et al., 2013)). Under Assumption 1.{3}. For any η > 0 and any x,y ∈ R such that x = proxηg(y − η∇f(y)), one has that F (x) ≤ F (y)− ( 1 2η − L 2 )‖x− y‖ . Applying Lemma 1 with x = xk,y = yk, we obtain that F (xk) ≤ F (yk)− ( 1 2η − L 2 )‖xk − yk‖ . (12) Since η < 1 L , it follows that F (xk) ≤ F (yk). Moreover, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Operations Research
سال: 2022
ISSN: ['1526-5463', '0030-364X']
DOI: https://doi.org/10.1287/opre.2021.2162