Learning rate selection in stochastic gradient methods based on line search strategies
نویسندگان
چکیده
Finite-sum problems appear as the sample average approximation of a stochastic optimization problem and often arise in machine learning applications with large scale data sets. A very popular approach to face finite-sum is gradient method. It well known that proper strategy select hyperparameters this method (i.e. set a-priori selected parameters) and, particular, rate, needed guarantee convergence properties good practical performance. In paper, we analyse standard line search based updating rules fix rate sequence, also relation size mini batch chosen compute current gradient. An extensive numerical experimentation carried out order evaluate effectiveness discussed strategies for convex non-convex test problems, highlighting methods avoid expensive initial setting hyperparameters. The approaches have been applied train Convolutional Neural Network, providing promising results.
منابع مشابه
Learning Rate Adaptation in Stochastic Gradient Descent
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...
متن کاملthe role of vocabulary learning strategies on vocabulary retention and on language proficiency in iranian efl students
آموزش زبان دوم طی سالهای اخیر بدنبال روشهای بهتری برای تحقق بخشیدن به اهداف معلمین و دانش آموزان بوده است . در مورد معلمین این امر منجر به تحقیقاتی در مورد ساختار زبانی، محاوره ای و تعاملی گردیده است . در مورد دانش آموزان این امر به مطالعاتی درباره نگرش دانش آموزان نسبت به فراگیری در داخل کلاس و بیرون از آن و همچنین انواع مختلف روشهای پردازش ذهنی منجر شده است . هدف از این تحقیق یافتن روشهائی اس...
15 صفحه اولChapter 2 LEARNING RATE ADAPTATION IN STOCHASTIC GRADIENT DESCENT
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...
متن کاملGlobal Convergence of Conjugate Gradient Methods without Line Search
Global convergence results are derived for well-known conjugate gradient methods in which the line search step is replaced by a step whose length is determined by a formula. The results include the following cases: 1. The Fletcher-Reeves method, the Hestenes-Stiefel method, and the Dai-Yuan method applied to a strongly convex LC objective function; 2. The Polak-Ribière method and the Conjugate ...
متن کاملNonlinear Conjugate Gradient Methods with Wolfe Type Line Search
and Applied Analysis 3 = ‖d k−1 ‖2 ‖g k−1 ‖4 + 1 ‖g k ‖2 − β2 k (gT k d k−1 ) 2 /‖g k ‖4
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied mathematics in science and engineering
سال: 2023
ISSN: ['2769-0911']
DOI: https://doi.org/10.1080/27690911.2022.2164000