Policy Iteration for Learning an Exercise Policy for American Options
نویسندگان
چکیده
Options are important financial instruments, whose prices are usually determined by computational methods. Computational finance is a compelling application area for reinforcement learning research, where hard sequential decision making problems abound and have great practical significance. In this paper, we investigate reinforcement learning methods, in particular, least squares policy iteration (LSPI), for the problem of learning an exercise policy for American options. We also investigate TVR, another policy iteration method. We compare LSPI, TVR with LSM, the standard least squares Monte Carlo method from the finance community. We evaluate their performance on both real and synthetic data. The results show that the exercise policies discovered by LSPI and TVR gain larger payoffs than those discovered by LSM, on both real and synthetic data. Furthermore, for LSPI, TVR and LSM, policies learned from real data generally gain larger payoffs than policies learned from simulated samples. Our work shows that solution methods developed in reinforcement learning can advance the state of the art in an important and challenging application area, and demonstrates furthermore that computational finance remains an under-explored area for deployment of reinforcement learning methods.
منابع مشابه
Learning an Exercise Policy for American Options from Real Data
We study approaches to learning an exercise policy for American options directly from real data. We investigate an approximate policy iteration method, namely, least squares policy iteration (LSPI), for the problem of pricing American options. We also extend the standard least squares Monte Carlo (LSM) method of Longstaff and Schwartz, by composing sample paths from real data. We test the perfo...
متن کاملLearning Exercise Policies for American Options
Options are important instruments in modern finance. In this paper, we investigate reinforcement learning (RL) methods— in particular, least-squares policy iteration (LSPI)—for the problem of learning exercise policies for American options. We develop finite-time bounds on the performance of the policy obtained with LSPI and compare LSPI and the fitted Q-iteration algorithm (FQI) with the Longs...
متن کاملPolicy iteration for american options: overview
This paper is an overview of recent results by Kolodko and Schoenmakers (2006), Bender and Schoenmakers (2006) on the evaluation of options with early exercise opportunities via policy improvement. Stability is discussed and simulation results based on plain Monte Carlo estimators for conditional expectations are presented.
متن کاملLearning Robust Options
Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Po...
متن کاملEnhanced policy iteration for American options via scenario selection
In Kolodko & Schoenmakers [9] and Bender & Schoenmakers [3] a policy iteration was introduced, which allows to achieve tight lower approximations of the price for early exercise options via a nested Monte-Carlo simulation in a Markovian setting. In this paper we enhance the algorithm by a scenario selection method. It is demonstrated by numerical examples that the scenario selection can signifi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008