From information theoretic dualities to Path Integral and Kullback Leibler control: Continuous and Discrete Time formulations
نویسندگان
چکیده
This paper presents a unified view of stochastic optimal control theory as developed within the machine learning and control theory communities. In particular we show the mathematical connection between recent work on Path Integral (PI) and Kullback Leibler (KL) divergence stochastic optimal control theory with earlier work on risk sensitivity and the fundamental dualities between free energy and relative entropy. We discuss the applications of the relationship between free energy and relative entropy to nonlinear stochastic dynamical systems affine in noise and nonlinear stochastic dynamics affine in control and noise. For this last class of systems, we provide the PI optimal control and its iterative formulation. In addition, we present the connection of PI control derived based on Dynamic Programming with the information theoretic dualities. Finally, we provide links to KL stochastic optimal control and discuss generalizations and future work.
منابع مشابه
Nonlinear Stochastic Control and Information Theoretic Dualities: Connections, Interdependencies and Thermodynamic Interpretations
In this paper, we present connections between recent developments on the linearly-solvable stochastic optimal control framework with early work in control theory based on the fundamental dualities between free energy and relative entropy. We extend these connections to nonlinear stochastic systems with non-affine controls by using the generalized version of the Feynman–Kac lemma. We present alt...
متن کاملComparison of Kullback-Leibler, Hellinger and LINEX with Quadratic Loss Function in Bayesian Dynamic Linear Models: Forecasting of Real Price of Oil
In this paper we intend to examine the application of Kullback-Leibler, Hellinger and LINEX loss function in Dynamic Linear Model using the real price of oil for 106 years of data from 1913 to 2018 concerning the asymmetric problem in filtering and forecasting. We use DLM form of the basic Hoteling Model under Quadratic loss function, Kullback-Leibler, Hellinger and LINEX trying to address the ...
متن کاملComparison of Information Theoretic Divergences for Sensor Management
In this paper, we compare the information-theoretic metrics of the Kullback-Leibler (K-L) and Renyi ( ) divergence formulations for sensor management. Information-theoretic metrics have been well suited for sensor management as they afford comparisons between distributions resulting from different types of sensors under different actions. The difference in distributions can also be measured as ...
متن کاملPolicy Search for Path Integral Control
Path integral (PI) control defines a general class of control problems for which the optimal control computation is equivalent to an inference problem that can be solved by evaluation of a path integral over state trajectories. However, this potential is mostly unused in real-world problems because of two main limitations: first, current approaches can typically only be applied to learn openloo...
متن کاملInformation Theory and Sensitivity Bounds
This supporting information file presents the sensitivity bounds (SBs) from an information theory perspective. The general SBs (both transient and stationary) are obtained by a limiting process on the relative entropy between path distributions. The information theory perspective provides also intuitive and explicit formulas for the quantities of interest for both Discrete Time Markov Chains (D...
متن کامل