When Does Stochastic Gradient Algorithm Work Well?

نویسندگان

Lam M. Nguyen

Nam H. Nguyen

Dzung T. Phan

Jayant Kalagnanam

Katya Scheinberg

چکیده

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhood of the optimal solutions). We then empirically demonstrate that these assumptions hold for logistic regression and standard deep neural networks on classical data sets. Thus our analysis helps to explain when efficient behavior can be expected from the SGD method in training classification models and deep neural networks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Local Gain Adaptation for Multi-Layer Perceptrons

We introduce a new method for adapting the step size of each individual weight in a multi-layer perceptron trained by stochastic gradient descent. Our technique derives from the K1 algorithm for linear systems (Sutton, 1992b), which in turn is based on a diagonalized Kalman Filter. We expand upon Sutton’s work in two regards: K1 is a) extended to multi-layer perceptrons, and b) made more effici...

متن کامل

Investigations on hessian-free optimization for cross-entropy training of deep neural networks

Context-dependent deep neural network HMMs have been shown to achieve recognition accuracy superior to Gaussian mixture models in a number of recent works. Typically, neural networks are optimized with stochastic gradient descent. On large datasets, stochastic gradient descent improves quickly during the beginning of the optimization. But since it does not make use of second order information, ...

متن کامل

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from...

متن کامل

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH), as well as its practical variant SARAH+, as a novel approach to the finite-sum minimization problems. Different from the vanilla SGD and other modern stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple recursive framework for updating stochastic gradient estimates; when comparing to SAG/SAGA, SAR...

متن کامل

Online Manifold Regularization: A New Learning Setting and Empirical Study

We consider a novel “online semi-supervised learning” setting where (mostly unlabeled) data arrives sequentially in large volume, and it is impractical to store it all before learning. We propose an online manifold regularization algorithm. It differs from standard online learning in that it learns even when the input point is unlabeled. Our algorithm is based on convex programming in kernel sp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1801.06159 شماره

صفحات -

تاریخ انتشار 2018

When Does Stochastic Gradient Algorithm Work Well?

نویسندگان

چکیده

منابع مشابه

Online Local Gain Adaptation for Multi-Layer Perceptrons

Investigations on hessian-free optimization for cross-entropy training of deep neural networks

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

Online Manifold Regularization: A New Learning Setting and Empirical Study

عنوان ژورنال:

اشتراک گذاری