KL-learning: Online solution of Kullback-Leibler control problems

نویسندگان

Joris Bierkens

Hilbert J. Kappen

چکیده

We introduce a stochastic approximation method for the solution of an ergodic Kullback-Leibler control problem. A Kullback-Leibler control problem is a Markov decision process on a finite state space in which the control cost is proportional to a Kullback-Leibler divergence of the controlled transition probabilities with respect to the uncontrolled transition probabilities. The algorithm discussed in this work allows for a sound theoretical analysis using the ODE method. In a numerical experiment the algorithm is shown to be comparable to the power method and the related Z-learning algorithm in terms of convergence speed. It may be used as the basis of a reinforcement learning style algorithm for Markov decision problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online solution of the average cost Kullback-Leibler optimization problem

We introduce a stochastic approximation method for the solution of a KullbackLeibler optimization problem, which is a generalization of Z-learning introduced by [Todorov, 2007]. A KL-optimization problem is Markov decision process with a finite state space and continuous control space. Because the control cost has a special form involving the Kullback-Leibler divergence, it can be shown that th...

متن کامل

Adaptive weighted learning for linear regression problems via Kullback-Leibler divergence

In this paper, we propose adaptive weighted learning for linear regression problems via the Kullback– Leibler (KL) divergence. The alternative optimization method is used to solve the proposed model. Meanwhile, we theoretically demonstrate that the solution of the optimization algorithm converges to a stationary point of the model. In addition, we also fuse global linear regression and class-or...

متن کامل

Online Blind Separation of Dependent Sources Using Nonnegative Matrix Factorization Based on KL Divergence

This paper proposes a novel online algorithm for nonnegative matrix factorization (NMF) based on the generalized Kullback-Leibler (KL) divergence criterion, aimed to overcome the high computation problem of large-scale data brought about by conventional batch NMF algorithms. It features stable updating the factors alternately for each new-coming observation, and provides an efficient solution f...

متن کامل

Information Complexity-Based Regularization Parameter Selection for Solution of Ill-Conditioned Inverse Problems

We propose an information complexity-based regularization parameter selection method for solution of ill-conditioned inverse problems. The regularization parameter is selected to be the minimizer of the Kullback-Leibler (KL) distance between the unknown data-generating distribution and the fitted distribution. The KL distance is approximated by an information complexity (ICOMP) criterion develo...

متن کامل

Lower Bounds and Aggregation in Density Estimation

In this paper we prove the optimality of an aggregation procedure. We prove lower bounds for aggregation of model selection type of M density estimators for the Kullback-Leibler divergence (KL), the Hellinger’s distance and the L1-distance. The lower bound, with respect to the KL distance, can be achieved by the on-line type estimate suggested, among others, by Yang (2000a). Combining these res...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1112.1996 شماره

صفحات -

تاریخ انتشار 2011

KL-learning: Online solution of Kullback-Leibler control problems

نویسندگان

چکیده

منابع مشابه

Online solution of the average cost Kullback-Leibler optimization problem

Adaptive weighted learning for linear regression problems via Kullback-Leibler divergence

Online Blind Separation of Dependent Sources Using Nonnegative Matrix Factorization Based on KL Divergence

Information Complexity-Based Regularization Parameter Selection for Solution of Ill-Conditioned Inverse Problems

Lower Bounds and Aggregation in Density Estimation

عنوان ژورنال:

اشتراک گذاری