Policy Gradient Methods for Automated Driving
نویسنده
چکیده
Automated driving requires designing a system capable of maintaining safety while simultaneously maintaining passenger comfort. Models of highway driving have high dimensionality and stochasticity traditionally specifying the histories for a large, varying number of agents in a continuous state and action space. Traditional value-based reinforcement learning methods require exponential time in the size of the state space and thus often require coarse state and action space discretization to remain tractable. Policy gradient methods instead maintain constant complexity proportional to the complexity of their parameterization and are thus candiates for use in such applications if suitable parameterizations can be found. This project explores several policy gradient methods against the problem of optimal autonomous highway driving framed as a Markov decision process.
منابع مشابه
Automatic measurement of instantaneous changes in the walls of carotid artery with sequential ultrasound images
Introduction: This study presents a computerized analyzing method for detection of instantaneous changes of far and near walls of the common carotid artery in sequential ultrasound images by applying the maximum gradient algorithm. Maximum gradient was modified and some characteristics were added from the dynamic programming algorithm for our applications. Methods: The algorithm was evaluat...
متن کاملSolving Deep Memory POMDPs with Recurrent Policy Gradients
This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic elig...
متن کاملGradient-free Policy Architecture Search and Adaptation
We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to...
متن کاملEquilibrium Policy Gradients for Spatiotemporal Planning
In spatiotemporal planning, agents choose actions at multiple locations in space over some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actions for thousands of locations in the forest each year. The actions at each location could include harvesting trees, treating trees against disease and pests, or doin...
متن کاملLagrange policy gradient
Most algorithms for reinforcement learning work by estimating action-value functions. Here we present a method that uses Lagrange multipliers, the costate equation, and multilayer neural networks to compute policy gradients. We show that this method can find solutions to time-optimal control problems, driving linear mechanical systems quickly to a target configuration. On these tasks its perfor...
متن کامل