Learning from Demonstration Using MDP Induced Metrics
نویسندگان
چکیده
In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate state-space metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classi cation problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure.
منابع مشابه
Knowledge Transfer in Markov Decision Processes
Markov Decision Processes (MDPs) are an effective way to formulate many problems in Machine Learning. However, learning the optimal policy for an MDP can be a time-consuming process, especially when nothing is known about the policy to begin with. An alternative approach is to find a similar MDP, for which an optimal policy is known, and modify this policy as needed. We present a framework for ...
متن کاملMeasuring the Distance Between Finite Markov Decision Processes
Markov decision processes (MDPs) have been studied for many decades. Recent research in using transfer learning methods to solve MDPs has shown that knowledge learned from one MDP may be used to solve a similar MDP better. In this paper, we propose two metrics for measuring the distance between finite MDPs. Our metrics are based on the Hausdorff metric which measures the distance between two su...
متن کاملAutomatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics
Temporally extended actions are usually effective in speeding up reinforcement learning. In this paper we present a mechanism for automatically constructing such actions, expressed as options [Sutton et al., 1999], in a finite Markov Decision Process (MDP). To do this, we compute a bisimulation metric [Ferns et al., 2004] between the states in a small MDP and the states in a large MDP, which we...
متن کاملKinesthetic Teaching via Trajectory Optimization with Common Distance Metrics
We propose a method for learning skills by learning kinesthetic skills via trajectory optimization. We recreate demonstrations by optimizing the trajectory to be near the demonstration. We use the trajectory optimizer, TrajOpt to turn distance metrics, the Frechet and Hausdorff distances, into cost functions that penalize a trajectory from deviating from the demonstration. Using a few technique...
متن کاملCompressed Conditional Mean Embeddings for Model-Based Reinforcement Learning
We present a model-based approach to solving Markov decision processes (MDPs) in which the system dynamics are learned using conditional mean embeddings (CMEs). This class of methods comes with strong performance guarantees, and enables planning to be performed in an induced finite (pseudo-)MDP, which approximates the MDP, but can be solved exactly using dynamic programming. Two drawbacks of ex...
متن کامل