Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is simple-yet-effective approach to assist the SGD in finding better optima, terms of generalization. From statistical perspective, weight-averaging contributes variance reduction. Recently, well-established (SWA) method was proposed, which featured application cyclical or high-constant (CHC) learning-rate ...