Beyond convexity—Contraction and global convergence of gradient descent
نویسندگان
چکیده
منابع مشابه
Convergence Analysis of Gradient Descent Stochastic Algorithms
This paper proves convergence of a sample-path based stochastic gradient-descent algorithm for optimizing expected-value performance measures in discrete event systems. The algorithm uses increasing precision at successive iterations, and it moves against the direction of a generalized gradient of the computed sample performance function. Two convergence results are established: one, for the ca...
متن کاملOn the Convergence of Decentralized Gradient Descent
Consider the consensus problem of minimizing f(x) = ∑n i=1 fi(x) where each fi is only known to one individual agent i belonging to a connected network of n agents. All the agents shall collaboratively solve this problem and obtain the solution via data exchanges only between neighboring agents. Such algorithms avoid the need of a fusion center, offer better network load balance, and improve da...
متن کاملConvergence properties of gradient descent noise reduction
Gradient descent noise reduction is a technique that attempts to recover the true signal, or trajectory, from noisy observations of a non-linear dynamical system for which the dynamics are known. This paper provides the first rigorous proof that the algorithm will recover the original trajectory for a broad class of dynamical systems under certain conditions. The proof is obtained using ideas f...
متن کاملConvergence of Gradient Descent on Separable Data
The implicit bias of gradient descent is not fully understood even in simple linear classification tasks (e.g., logistic regression). Soudry et al. (2018) studied this bias on separable data, where there are multiple solutions that correctly classify the data. It was found that, when optimizing monotonically decreasing loss functions with exponential tails using gradient descent, the linear cla...
متن کاملConvergence of Stochastic Gradient Descent for PCA
We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: PLOS ONE
سال: 2020
ISSN: 1932-6203
DOI: 10.1371/journal.pone.0236661