Adaptive Sequential Stochastic Optimization
نویسندگان
چکیده
منابع مشابه
Optimization by Adaptive Stochastic Descent
When standard optimization methods fail to find a satisfactory solution for a parameter fitting problem, a tempting recourse is to adjust parameters manually. While tedious, this approach can be surprisingly powerful in terms of achieving optimal or near-optimal solutions. This paper outlines an optimization algorithm, Adaptive Stochastic Descent (ASD), that has been designed to replicate the e...
متن کاملAdaDelay: Delay Adaptive Distributed Stochastic Optimization
We develop distributed stochastic convex op-timization algorithms under a delayed gradi-ent model in which server nodes update pa-rameters and worker nodes compute stochas-tic (sub)gradients. Our setup is motivated bythe behavior of real-world distributed com-putation systems; in particular, we analyzea setting wherein worker nodes can be dif-ferently slow at dif...
متن کاملAdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization
We study distributed stochastic convex optimization under the delayed gradient model where theserver nodes perform parameter updates, while the worker nodes compute stochastic gradients. Wediscuss, analyze, and experiment with a setup motivated by the behavior of real-world distributedcomputation networks, where the machines are differently slow at different time. Therefore, we ...
متن کاملRADAGRAD: Random Projections for Adaptive Stochastic Optimization
We present RADAGRAD a simple and computationally efficient approximation to full-matrix ADAGRAD based on dimensionality reduction using the subsampled randomized Hadamard transform. RADAGRAD is able to capture correlations in the gradients and achieves a similar regret – in theory and empirically – to fullmatrix ADAGRAD but at a computational cost comparable to the diagonal variant.
متن کاملDistributed Stochastic Optimization via Adaptive Stochastic Gradient Descent
Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial in many applications, but the most popular algorithm, Stochastic Gradient Descent (SGD), is a serial algorithm that is surprisingly hard to parallelize. In this paper, we propose an efficient distributed stochastic op...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Automatic Control
سال: 2019
ISSN: 0018-9286,1558-2523,2334-3303
DOI: 10.1109/tac.2018.2816168