gradient descent algorithm

Worst-case Loss Bounds for Single Neurons

1995

David P. Helmbold Jyrki Kivinen Manfred K. Warmuth

We analyze and compare the well-known Gradient Descent algorithm and a new algorithm, called the Exponentiated Gradient algorithm, for training a single neuron with an arbitrary transfer function . Both algorithms are easily generalized to larger neural networks, and the generalization of Gradient Descent is the standard back-propagation algorithm. In this paper we prove worstcase loss bounds f...

متن کامل

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage

2009

Simon Setzer

We examine relations between popular variational methods in image processing and classical operator splitting methods in convex analysis. We focus on a gradient descent reprojection algorithm for image denoising and the recently proposed Split Bregman and alternating Split Bregman methods. By identifying the latter with the so-called DouglasRachford splitting algorithm we can guarantee its conv...

متن کامل

توسیعی از یک روش گرادیان مزدوج سه بخشی مبتنی بر مقادیر تابع هدف با تضمین همگرایی بدون فرض تحدب

ژورنال: تحقیق در عملیات در کاربردهای آن 2018

آرزم, محمد رضا, بابایی کفاکی, سامان,

With respect to importance of the conjugate gradient methods for large-scale optimization, in this study a descent three-term conjugate gradient method is proposed based on an extended modified secant condition. In the proposed method, objective function values are used in addition to the gradient information. Also, it is established that the method is globally convergent without convexity assu...

متن کامل

The Limited Memory Conjugate Gradient Method

Journal: :SIAM Journal on Optimization 2013

William W. Hager Hongchao Zhang

In theory, the successive gradients generated by the conjugate gradient method applied to a quadratic should be orthogonal. However, for some ill-conditioned problems, orthogonality is quickly lost due to rounding errors, and convergence is much slower than expected. A limited memory version of the nonlinear conjugate gradient method is developed. The memory is used to both detect the loss of o...

متن کامل

Training neural networks with ant colony optimization algorithms for pattern classification

Journal: :Soft Comput. 2015

Michalis Mavrovouniotis Shengxiang Yang

Feed-forward neural networks are commonly used for pattern classification. The classification accuracy of feed-forward neural networks depends on the configuration selected and the training process. Once the architecture of the network is decided, training algorithms, usually gradient descent techniques, are used to determine the connection weights of the feed-forward neural network. However, g...

متن کامل

A Stabilizing and Robustifying Training Scheme for Artificial Neural Networks Used in Control of Complex Systems

2000

M. Onder Efe Okyay Kaynak

This paper presents a method for stabilizing and robustifying the artificial neural networks trained by utilizing the gradient descent. The method proposed constructs a dynamic model of the conventional update mechanism and derives the stabilizing values of the learning rate. The stability in this context corresponds to the convergence in adjustable parameters of the neural network structure. I...

متن کامل

Fast Implementation of l 1 Regularized Learning Algorithms Using Gradient Descent Methods ∗

2010

Yunpeng Cai Yijun Sun Yubo Cheng Jian Li Steve Goodison

With the advent of high-throughput technologies, l1 regularized learning algorithms have attracted much attention recently. Dozens of algorithms have been proposed for fast implementation, using various advanced optimization techniques. In this paper, we demonstrate that l1 regularized learning problems can be easily solved by using gradient-descent techniques. The basic idea is to transform a ...

متن کامل

Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent

Journal: :CoRR 2011

Wei Xu

For large scale learning problems, it is desirable if we can obtain the optimal model parameters by going through the data in only one pass. Polyak and Juditsky (1992) showed that asymptotically the test performance of the simple average of the parameters obtained by stochastic gradient descent (SGD) is as good as that of the parameters which minimize the empirical cost. However, to our knowled...

متن کامل

Scalable Heterogeneous Transfer Ranking

2014

Mohammad Taha Bahadori Yi Chang Bo Long Yan Liu

In this paper, we propose to study the problem of heterogeneous transfer ranking, a transfer learning problem with heterogeneous features in order to utilize the rich large-scale labeled data in popular languages to help the ranking task in less popular languages. We develop a large-margin algorithm, namely LM-HTR, to solve the problem by mapping the input features in both the source domain and...

متن کامل

Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates

Journal: :CoRR 2013

Shenghuo Zhu

With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T ) for strongly convex functions, instead of O(κ ln(T )/T ). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T ).

متن کامل