L-sr1: a Second Order Optimization Method for Deep Learning
نویسندگان
چکیده
We describe L-SR1, a new second order method to train deep neural networks. Second order methods hold great promise for distributed training of deep networks. Unfortunately, they have not proven practical. Two significant barriers to their success are inappropriate handling of saddle points, and poor conditioning of the Hessian. L-SR1 is a practical second order method that addresses these concerns. We provide experimental results showing that L-SR1 performs at least as well as Nesterov’s Accelerated Gradient Descent, on the MNIST and CIFAR10 datasets. For the CIFAR10 dataset, we see competitive performance on shallow networks like LeNet5, as well as on deeper networks like residual networks. Furthermore, we perform an experimental analysis of L-SR1 with respect to its hyperparameters to gain greater intuition. Finally, we outline the potential usefulness of L-SR1 in distributed training of deep neural networks.
منابع مشابه
A Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملA Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملOptimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning
The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a su...
متن کاملA symmetric rank-one quasi-Newton line-search method using negative curvature directions
We propose a quasi-Newton line-search method that uses negative curvature directions for solving unconstrained optimization problems. In this method, the symmetric rank-one (SR1) rule is used to update the Hessian approximation. The SR1 update rule is known to have a good numerical performance; however, it does not guarantee positive definiteness of the updated matrix. We first discuss the deta...
متن کاملA Discrete Hybrid Teaching-Learning-Based Optimization algorithm for optimization of space trusses
In this study, to enhance the optimization process, especially in the structural engineering field two well-known algorithms are merged together in order to achieve an improved hybrid algorithm. These two algorithms are Teaching-Learning Based Optimization (TLBO) and Harmony Search (HS) which have been used by most researchers in varied fields of science. The hybridized algorithm is called A Di...
متن کامل