نتایج جستجو برای: gradient descent algorithm

تعداد نتایج: 869527  

1995
David P. Helmbold Jyrki Kivinen Manfred K. Warmuth

We analyze and compare the well-known Gradient Descent algorithm and a new algorithm, called the Exponentiated Gradient algorithm, for training a single neuron with an arbitrary transfer function . Both algorithms are easily generalized to larger neural networks, and the generalization of Gradient Descent is the standard back-propagation algorithm. In this paper we prove worstcase loss bounds f...

2009
Simon Setzer

We examine relations between popular variational methods in image processing and classical operator splitting methods in convex analysis. We focus on a gradient descent reprojection algorithm for image denoising and the recently proposed Split Bregman and alternating Split Bregman methods. By identifying the latter with the so-called DouglasRachford splitting algorithm we can guarantee its conv...

With respect to importance of the conjugate gradient methods for large-scale optimization, in this study a descent three-term conjugate gradient method is proposed based on an extended modified secant condition. In the proposed method, objective function values are used in addition to the gradient information. Also, it is established that the method is globally convergent without convexity assu...

Journal: :SIAM Journal on Optimization 2013
William W. Hager Hongchao Zhang

In theory, the successive gradients generated by the conjugate gradient method applied to a quadratic should be orthogonal. However, for some ill-conditioned problems, orthogonality is quickly lost due to rounding errors, and convergence is much slower than expected. A limited memory version of the nonlinear conjugate gradient method is developed. The memory is used to both detect the loss of o...

Journal: :Soft Comput. 2015
Michalis Mavrovouniotis Shengxiang Yang

Feed-forward neural networks are commonly used for pattern classification. The classification accuracy of feed-forward neural networks depends on the configuration selected and the training process. Once the architecture of the network is decided, training algorithms, usually gradient descent techniques, are used to determine the connection weights of the feed-forward neural network. However, g...

2000
M. Onder Efe Okyay Kaynak

This paper presents a method for stabilizing and robustifying the artificial neural networks trained by utilizing the gradient descent. The method proposed constructs a dynamic model of the conventional update mechanism and derives the stabilizing values of the learning rate. The stability in this context corresponds to the convergence in adjustable parameters of the neural network structure. I...

2010
Yunpeng Cai Yijun Sun Yubo Cheng Jian Li Steve Goodison

With the advent of high-throughput technologies, l1 regularized learning algorithms have attracted much attention recently. Dozens of algorithms have been proposed for fast implementation, using various advanced optimization techniques. In this paper, we demonstrate that l1 regularized learning problems can be easily solved by using gradient-descent techniques. The basic idea is to transform a ...

Journal: :CoRR 2011
Wei Xu

For large scale learning problems, it is desirable if we can obtain the optimal model parameters by going through the data in only one pass. Polyak and Juditsky (1992) showed that asymptotically the test performance of the simple average of the parameters obtained by stochastic gradient descent (SGD) is as good as that of the parameters which minimize the empirical cost. However, to our knowled...

2014
Mohammad Taha Bahadori Yi Chang Bo Long Yan Liu

In this paper, we propose to study the problem of heterogeneous transfer ranking, a transfer learning problem with heterogeneous features in order to utilize the rich large-scale labeled data in popular languages to help the ranking task in less popular languages. We develop a large-margin algorithm, namely LM-HTR, to solve the problem by mapping the input features in both the source domain and...

Journal: :CoRR 2013
Shenghuo Zhu

With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T ) for strongly convex functions, instead of O(κ ln(T )/T ). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T ).

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید