SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
نویسندگان
چکیده
Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical analysis of convergence of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al., 2016) a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds. Moreover, we propose an alternative convergence analysis of SGD with diminishing learning rate regime, which is results in more relaxed conditions that those in (Bottou et al., 2016). We then move on the asynchronous parallel setting, and prove convergence of the Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate. Department of Industrial and Systems Engineering, Lehigh University, USA. Department of Electrical and Computer Engineering, University of Connecticut, USA. KAUST, KSA — Edinburgh, UK — MIPT, Russia. Correspondence to: Lam M. Nguyen , Phuong Ha Nguyen , Marten van Dijk , Peter Richtárik , Katya Scheinberg , Martin Takáč . Lam M. Nguyen was partially supported by NSF Grants CCF 16-18717. Phuong Ha Nguyen and Marten van Dijk were supported in part by AFOSR MURI under award number FA955014-1-0351. Katya Scheinberg was partially supported by NSF Grants CCF 16-18717 and CCF 17-40796. Martin Takáč was supported by U.S. National Science Foundation, under award number NSF:CCF:1618717, NSF:CMMI:1663256 and NSF:CCF:1740796.
منابع مشابه
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateof-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implem...
متن کاملCuMF_SGD: Fast and Scalable Matrix Factorization
Matrix factorization (MF) has been widely used in e.g., recommender systems, topic modeling and word embedding. Stochastic gradient descent (SGD) is popular in solving MF problems because it can deal with large data sets and is easy to do incremental learning. We observed that SGD for MF is memory bound. Meanwhile, single-node CPU systems with caching performs well only for small data sets; dis...
متن کاملFast Asynchronous Parallel Stochastic Gradient Decent
Stochastic gradient descent (SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD methods for multicore systems. However, existing parallel SGD methods cannot achieve satisfactory performance in real applications. In this paper, we propose a f...
متن کاملLock-Free Optimization for Non-Convex Problems
Stochastic gradient descent (SGD) and its variants have attracted much attention in machine learning due to their efficiency and effectiveness for optimization. To handle largescale problems, researchers have recently proposed several lock-free strategy based parallel SGD (LF-PSGD) methods for multi-core systems. However, existing works have only proved the convergence of these LF-PSGD methods ...
متن کاملFast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee
Stochastic gradient descent (SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD methods for multicore systems. However, existing parallel SGD methods cannot achieve satisfactory performance in real applications. In this paper, we propose a f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.03801 شماره
صفحات -
تاریخ انتشار 2018