Learning Gradient Descent: Better Generalization and Longer Horizons

نویسندگان

  • Kaifeng Lv
  • Shunhua Jiang
  • Jian Li
چکیده

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and time consuming. Recently, researchers have tried to use deep learning algorithms to exploit the landscape of the loss function of the training problem of interest, and learn how to optimize over it in an automatic way. In this paper, we propose a new learning-to-learn model and some useful and practical tricks. Our optimizer outperforms generic, hand-crafted optimization algorithms and state-of-the-art learning-to-learn optimizers by DeepMind in many tasks. We demonstrate the effectiveness of our algorithms on a number of tasks, including deep MLPs, CNNs, and simple LSTMs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been observed that when using large batch sizes there is a persistent degradation in generalization performance known as the "generalization gap" phenomena. Identifying...

متن کامل

Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent

A major challenge in understanding the generalization of deep learning is to explain why (stochastic) gradient descent can exploit the network architecture to find solutions that have good generalization performance when using high capacity models. We find simple but realistic examples showing that this phenomenon exists even when learning linear classifiers — between two linear networks with t...

متن کامل

Learning by Online Gradient Descent

We study online gradient{descent learning in multilayer networks analytically and numerically. The training is based on randomly drawn inputs and their corresponding outputs as deened by a target rule. In the thermo-dynamic limit we derive deterministic diierential equations for the order parameters of the problem which allow an exact calculation of the evolution of the generalization error. Fi...

متن کامل

A New Rule-weight Learning Method based on Gradient Descent

In this paper, we propose a simple and efficient method to construct an accurate fuzzy classification system. In order to optimize the generalization accuracy, we use ruleweight as a simple mechanism to tune the classifier and propose a new learning method to iteratively adjust the weight of fuzzy rules. The rule-weights in the proposed method are derived by solving the minimization problem thr...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017