KL-learning: Online solution of Kullback-Leibler control problems
نویسندگان
چکیده
We introduce a stochastic approximation method for the solution of an ergodic Kullback-Leibler control problem. A Kullback-Leibler control problem is a Markov decision process on a finite state space in which the control cost is proportional to a Kullback-Leibler divergence of the controlled transition probabilities with respect to the uncontrolled transition probabilities. The algorithm discussed in this work allows for a sound theoretical analysis using the ODE method. In a numerical experiment the algorithm is shown to be comparable to the power method and the related Z-learning algorithm in terms of convergence speed. It may be used as the basis of a reinforcement learning style algorithm for Markov decision problems.
منابع مشابه
Online solution of the average cost Kullback-Leibler optimization problem
We introduce a stochastic approximation method for the solution of a KullbackLeibler optimization problem, which is a generalization of Z-learning introduced by [Todorov, 2007]. A KL-optimization problem is Markov decision process with a finite state space and continuous control space. Because the control cost has a special form involving the Kullback-Leibler divergence, it can be shown that th...
متن کاملAdaptive weighted learning for linear regression problems via Kullback-Leibler divergence
In this paper, we propose adaptive weighted learning for linear regression problems via the Kullback– Leibler (KL) divergence. The alternative optimization method is used to solve the proposed model. Meanwhile, we theoretically demonstrate that the solution of the optimization algorithm converges to a stationary point of the model. In addition, we also fuse global linear regression and class-or...
متن کاملOnline Blind Separation of Dependent Sources Using Nonnegative Matrix Factorization Based on KL Divergence
This paper proposes a novel online algorithm for nonnegative matrix factorization (NMF) based on the generalized Kullback-Leibler (KL) divergence criterion, aimed to overcome the high computation problem of large-scale data brought about by conventional batch NMF algorithms. It features stable updating the factors alternately for each new-coming observation, and provides an efficient solution f...
متن کاملInformation Complexity-Based Regularization Parameter Selection for Solution of Ill-Conditioned Inverse Problems
We propose an information complexity-based regularization parameter selection method for solution of ill-conditioned inverse problems. The regularization parameter is selected to be the minimizer of the Kullback-Leibler (KL) distance between the unknown data-generating distribution and the fitted distribution. The KL distance is approximated by an information complexity (ICOMP) criterion develo...
متن کاملLower Bounds and Aggregation in Density Estimation
In this paper we prove the optimality of an aggregation procedure. We prove lower bounds for aggregation of model selection type of M density estimators for the Kullback-Leibler divergence (KL), the Hellinger’s distance and the L1-distance. The lower bound, with respect to the KL distance, can be achieved by the on-line type estimate suggested, among others, by Yang (2000a). Combining these res...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1112.1996 شماره
صفحات -
تاریخ انتشار 2011