Adaptive Natural Gradient Learning Based on Riemannian Metric of Score Matching
نویسندگان
چکیده
The natural gradient is a powerful method to improve the transient dynamics of learning by considering the geometric structure of the parameter space. Many natural gradient methods have been developed with regards to Kullback-Leibler (KL) divergence and its Fisher metric, but the framework of natural gradient can be essentially extended to other divergences. In this study, we focus on score matching, which is an alternative to maximum likelihood learning for unnormalized statistical models, and introduce its Riemannian metric. By using the score matching metric, we derive an adaptive natural gradient algorithm that does not require computationally demanding inversion of the metric. Experimental results in a multi-layer neural network model demonstrate that the proposed method avoids the plateau phenomenon and accelerates the convergence of learning compared to the conventional stochastic gradient descent method. 1 SCORE MATCHING AND ITS RIEMANNIAN METRIC Score matching has been developed for training unnormalized statistical models and applied to various kinds of practical applications such as signal processing (Hyvärinen, 2005) and representation learning for visual and acoustic data (Köster & Hyvärinen, 2010). We can also train single-layer models (Swersky et al., 2011; Vincent, 2011) and a two-layer model with the analytically intractable normalization constants (Köster & Hyvärinen, 2010), which are hard to train by maximum likelihood learning. The objective function of score matching is given by the squared distance between derivatives of the log-density, DSM [q : p] = ∫ dxq(x) ∑ i |∂i log q(x) − ∂i log p(x)|, where we denote the derivative with respect to the i-th probability variable as a partial derivative symbol ∂i = ∂ ∂xi . In this paper, we refer to this objective function as score matching (SM) divergence. In general, we can derive the Riemannian structure from any divergence (Eguchi, 1983; Amari, 2016). Let us consider a parametric probability distribution p(x; ξ). When we estimate the parameter ξ with a divergence D[q : p], its parameter space has the Riemannian metric matrix G defined by D[p(x; ξ) : p(x; ξ + dξ)] = ∑ i,j Gijdξidξj . The metric matrix G can be obtained by the second derivative, Gij = ∂ 2 ∂ξ′ i∂ξ ′ j D[p(x; ξ) : p(x; ξ′)] ∣∣ ξ′=ξ . In particular, when we consider the SM divergence, its metric becomes the following positive semi-definite matrix,
منابع مشابه
Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient
The parameter space of neural networks has a Riemannian metric structure. The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper ...
متن کاملNatural Gradient Approach to Blind Separationof over - and under - Complete Mixturesl
In this paper we study natural gradient approaches to blind separation of over-and under-complete mixtures. First we introduce Lie group structures on the mani-folds of the under-and over-complete mixture matrices respectively, and endow Riemannian metrics on the manifolds based on the property of Lie groups. Then we derive the natural gradients on the manifolds using the isometry of the Rieman...
متن کاملA Neural Stiefel Learning based on Geodesics Revisited
In this paper we present an unsupervised learning algorithm of neural networks with p inputs and m outputs whose weight vectors have orthonormal constraints. In this setting the learning algorithm can be regarded as optimization posed on the Stiefel manifold, and we generalize the natural gradient method to this case based on geodesics. By exploiting its geometric property as a quotient space: ...
متن کاملGeometrical Structures of Fir Manifold and Their Application to Multichannel Blind Deconvolution
In this paper we study geometrical structures on the manifold of FIR lters and their application to multichannel blind deconvolution First we introduce the Lie group and Riemannian metric to the manifold of FIR lters Then we derive the natu ral gradient on the manifold using the isometry of the Riemannian metric Using the natural gradient we present a novel learning algorithm for blind deconvol...
متن کاملGradient Learning in Structured Parameter Spaces: Adaptive Blind Separation of Signal Sources
The present paper discusses the natural gradient descent learning rules in parameter spaces which have Riemannian geometrical structures. A modi cation is necessary for de ning the steepest descent (gradient) direction in a Riemannian parameter space. Parameter spaces of multilayer perceptrons are good examples of the Riemannian nature. Another example is the space of matrices on which adaptive...
متن کامل