Policy Gradient Methods for Automated Driving

نویسنده

  • Tim A. Wheeler
چکیده

Automated driving requires designing a system capable of maintaining safety while simultaneously maintaining passenger comfort. Models of highway driving have high dimensionality and stochasticity traditionally specifying the histories for a large, varying number of agents in a continuous state and action space. Traditional value-based reinforcement learning methods require exponential time in the size of the state space and thus often require coarse state and action space discretization to remain tractable. Policy gradient methods instead maintain constant complexity proportional to the complexity of their parameterization and are thus candiates for use in such applications if suitable parameterizations can be found. This project explores several policy gradient methods against the problem of optimal autonomous highway driving framed as a Markov decision process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic measurement of instantaneous changes in the walls of carotid artery with sequential ultrasound images

Introduction: This study presents a computerized analyzing method for detection of instantaneous changes of far and near walls of the common carotid artery in sequential ultrasound images by applying the maximum gradient algorithm. Maximum gradient was modified and some characteristics were added from the dynamic programming algorithm for our applications. Methods: The algorithm was evaluat...

متن کامل

Solving Deep Memory POMDPs with Recurrent Policy Gradients

This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic elig...

متن کامل

Gradient-free Policy Architecture Search and Adaptation

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to...

متن کامل

Equilibrium Policy Gradients for Spatiotemporal Planning

In spatiotemporal planning, agents choose actions at multiple locations in space over some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actions for thousands of locations in the forest each year. The actions at each location could include harvesting trees, treating trees against disease and pests, or doin...

متن کامل

Lagrange policy gradient

Most algorithms for reinforcement learning work by estimating action-value functions. Here we present a method that uses Lagrange multipliers, the costate equation, and multilayer neural networks to compute policy gradients. We show that this method can find solutions to time-optimal control problems, driving linear mechanical systems quickly to a target configuration. On these tasks its perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015