On the Convergence of a Family of Robust Losses for Stochastic Gradient Descent
نویسندگان
چکیده
The convergence of Stochastic Gradient Descent (SGD) using convex loss functions has been widely studied. However, vanilla SGD methods using convex losses cannot perform well with noisy labels, which adversely affect the update of the primal variable in SGD methods. Unfortunately, noisy labels are ubiquitous in real world applications such as crowdsourcing. To handle noisy labels, in this paper, we present a family of robust losses for SGD methods. By employing our robust losses, SGD methods successfully reduce negative effects caused by noisy labels on each update of the primal variable. We not only reveal that the convergence rate is O(1/T ) for SGD methods using robust losses, but also provide the robustness analysis on two representative robust losses. Comprehensive experimental results on six real-world datasets show that SGD methods using robust losses are obviously more robust than other baseline methods in most situations with fast convergence.
منابع مشابه
Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network
Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...
متن کاملAveraging Stochastic Gradient Descent on Riemannian Manifolds
We consider the minimization of a function defined on a Riemannian manifold M accessible only through unbiased estimates of its gradients. We develop a geometric framework to transform a sequence of slowly converging iterates generated from stochastic gradient descent (SGD) on M to an averaged iterate sequence with a robust and fast O(1/n) convergence rate. We then present an application of our...
متن کاملWavelet Based Estimation of the Derivatives of a Density for a Discrete-Time Stochastic Process: Lp-Losses
We propose a method of estimation of the derivatives of probability density based on wavelets methods for a sequence of random variables with a common one-dimensional probability density function and obtain an upper bound on Lp-losses for such estimators. We suppose that the process is strongly mixing and we show that the rate of convergence essentially depends on the behavior of a special quad...
متن کاملMinimizing Calibrated Loss using Stochastic Low-Rank Newton Descent for large scale image classification
A standard approach for large scale image classification involves high dimensional features and Stochastic Gradient Descent algorithm (SGD) for the minimization of classical Hinge Loss in the primal space. Although complexity of Stochastic Gradient Descent is linear with the number of samples these method suffers from slow convergence. In order to cope with this issue, we propose here a Stochas...
متن کاملStochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses
Stochastic Coordinate Descent (SCD) methods are among the first optimization schemes suggested for efficiently solving large scale problems. However, until now, there exists a gap between the convergence rate analysis and practical SCD algorithms for general smooth losses and there is no primal SCD algorithm for nonsmooth losses. In this paper, we discuss these issues using the recently develop...
متن کامل