Stochastic gradient algorithms estimate the gradient based on only one or a fewsamples and enjoy low computational cost per iteration. They have been widelyused in large-scale optimization problems. However, stochastic gradient algo-rithms are usually slow to converge and achieve sub-linear convergence rates,due to the inherent variance in the gradient computation. To accelerate...