Adaptive Natural Gradient Learning Based on Riemannian Metric of Score Matching

نویسندگان

Ryo Karakida

Masato Okada

Shun-ichi Amari

چکیده

The natural gradient is a powerful method to improve the transient dynamics of learning by considering the geometric structure of the parameter space. Many natural gradient methods have been developed with regards to Kullback-Leibler (KL) divergence and its Fisher metric, but the framework of natural gradient can be essentially extended to other divergences. In this study, we focus on score matching, which is an alternative to maximum likelihood learning for unnormalized statistical models, and introduce its Riemannian metric. By using the score matching metric, we derive an adaptive natural gradient algorithm that does not require computationally demanding inversion of the metric. Experimental results in a multi-layer neural network model demonstrate that the proposed method avoids the plateau phenomenon and accelerates the convergence of learning compared to the conventional stochastic gradient descent method. 1 SCORE MATCHING AND ITS RIEMANNIAN METRIC Score matching has been developed for training unnormalized statistical models and applied to various kinds of practical applications such as signal processing (Hyvärinen, 2005) and representation learning for visual and acoustic data (Köster & Hyvärinen, 2010). We can also train single-layer models (Swersky et al., 2011; Vincent, 2011) and a two-layer model with the analytically intractable normalization constants (Köster & Hyvärinen, 2010), which are hard to train by maximum likelihood learning. The objective function of score matching is given by the squared distance between derivatives of the log-density, DSM [q : p] = ∫ dxq(x) ∑ i |∂i log q(x) − ∂i log p(x)|, where we denote the derivative with respect to the i-th probability variable as a partial derivative symbol ∂i = ∂ ∂xi . In this paper, we refer to this objective function as score matching (SM) divergence. In general, we can derive the Riemannian structure from any divergence (Eguchi, 1983; Amari, 2016). Let us consider a parametric probability distribution p(x; ξ). When we estimate the parameter ξ with a divergence D[q : p], its parameter space has the Riemannian metric matrix G defined by D[p(x; ξ) : p(x; ξ + dξ)] = ∑ i,j Gijdξidξj . The metric matrix G can be obtained by the second derivative, Gij = ∂ 2 ∂ξ′ i∂ξ ′ j D[p(x; ξ) : p(x; ξ′)] ∣∣ ξ′=ξ . In particular, when we consider the SM divergence, its metric becomes the following positive semi-definite matrix,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient

The parameter space of neural networks has a Riemannian metric structure. The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper ...

متن کامل

Natural Gradient Approach to Blind Separationof over - and under - Complete Mixturesl

In this paper we study natural gradient approaches to blind separation of over-and under-complete mixtures. First we introduce Lie group structures on the mani-folds of the under-and over-complete mixture matrices respectively, and endow Riemannian metrics on the manifolds based on the property of Lie groups. Then we derive the natural gradients on the manifolds using the isometry of the Rieman...

متن کامل

A Neural Stiefel Learning based on Geodesics Revisited

In this paper we present an unsupervised learning algorithm of neural networks with p inputs and m outputs whose weight vectors have orthonormal constraints. In this setting the learning algorithm can be regarded as optimization posed on the Stiefel manifold, and we generalize the natural gradient method to this case based on geodesics. By exploiting its geometric property as a quotient space: ...

متن کامل

Geometrical Structures of Fir Manifold and Their Application to Multichannel Blind Deconvolution

In this paper we study geometrical structures on the manifold of FIR lters and their application to multichannel blind deconvolution First we introduce the Lie group and Riemannian metric to the manifold of FIR lters Then we derive the natu ral gradient on the manifold using the isometry of the Riemannian metric Using the natural gradient we present a novel learning algorithm for blind deconvol...

متن کامل

Gradient Learning in Structured Parameter Spaces: Adaptive Blind Separation of Signal Sources

The present paper discusses the natural gradient descent learning rules in parameter spaces which have Riemannian geometrical structures. A modi cation is necessary for de ning the steepest descent (gradient) direction in a Riemannian parameter space. Parameter spaces of multilayer perceptrons are good examples of the Riemannian nature. Another example is the space of matrices on which adaptive...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Adaptive Natural Gradient Learning Based on Riemannian Metric of Score Matching

نویسندگان

چکیده

منابع مشابه

Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient

Natural Gradient Approach to Blind Separationof over - and under - Complete Mixturesl

A Neural Stiefel Learning based on Geodesics Revisited

Geometrical Structures of Fir Manifold and Their Application to Multichannel Blind Deconvolution

Gradient Learning in Structured Parameter Spaces: Adaptive Blind Separation of Signal Sources

عنوان ژورنال:

اشتراک گذاری