Accumulated Gradient Normalization
نویسندگان
چکیده
This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous easgd and dynsgd, which we show empirically.
منابع مشابه
Recurrent Normalization Propagation
We propose an LSTM parametrization that preserves the means and variances of the hidden states and memory cells across time. While having training benefits similar to Recurrent Batch Normalization and Layer Normalization, it does not need to estimate statistics at each time step, therefore, requiring fewer computations overall. We also investigate the parametrization impact on the gradient flow...
متن کاملGradient Normalization & Depth Based Decay For Deep Learning
In this paper we introduce a novel method of gradient normalization and decay with respect to depth. Our method leverages the simple concept of normalizing all gradients in a deep neural network, and then decaying said gradients with respect to their depth in the network. Our proposed normalization and decay techniques can be used in conjunction with most current state of the art optimizers and...
متن کاملFluorophore-labeled primers improve the sensitivity, versatility, and normalization of denaturing gradient gel electrophoresis.
Denaturing gradient gel electrophoresis (DGGE) is widely used in microbial ecology. We tested the effect of fluorophore-labeled primers on DGGE band migration, sensitivity, and normalization. The fluorophores Cy5 and Cy3 did not visibly alter DGGE fingerprints; however, 6-carboxyfluorescein retarded band migration. Fluorophore modification improved the sensitivity of DGGE fingerprint detection ...
متن کاملWeight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does no...
متن کاملAn Engineering Approach to Generalized Conjugategradient
This is an elementary introductory paper (rather a tutorial) for generalized conjugate gradient (CG) methods. The emphasis is on \understanding" the approach of the now rather sophisticated CG-type methods and beyond. Practical aspects like normalization and smoothing are discussed. Then the development of diierent types of methods with with their characteristic properties is presented. For a m...
متن کامل