The QLBS Q-Learner Goes NuQLear: Fitted Q Iteration, Inverse RL, and Option Portfolios
نویسنده
چکیده
The QLBS model is a discrete-time option hedging and pricing model that is based on Dynamic Programming (DP) and Reinforcement Learning (RL). It combines the famous Q-Learning method for RL with the Black-Scholes (-Merton) model’s idea of reducing the problem of option pricing and hedging to the problem of optimal rebalancing of a dynamic replicating portfolio for the option, which is made of a stock and cash. Here we expand on several NuQLear (Numerical Q-Learning) topics with the QLBS model. First, we investigate the performance of Fitted Q Iteration for a RL (data-driven) solution to the model, and benchmark it versus a DP (modelbased) solution, as well as versus the BSM model. Second, we develop an Inverse Reinforcement Learning (IRL) setting for the model, where we only observe prices and actions (re-hedges) taken by a trader, but not rewards. Third, we outline how the QLBS model can be used for pricing portfolios of options, rather than a single option in isolation, thus providing its own, data-driven and model independent solution to the (in)famous volatility smile problem of the Black-Scholes model. I would like to thank Eric Berger and Vivek Kapoor for stimulating discussions. I thank Bohui Xi, Tianrui Zhao, and Yuhan Liu for an initial implementation of a DP solution of the QLBS model. 1 ar X iv :1 80 1. 06 07 7v 1 [ qfi n. C P] 1 7 Ja n 20 18
منابع مشابه
Reinforcement Learning of Multi-Party Trading Dialog Policies
Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in previous research, the focus was on negotiation dialog between two participants only, ignoring c...
متن کاملQLBS: Q-Learner in the Black-Scholes(-Merton) Worlds
This paper presents a discrete-time option pricing model that is rooted in Reinforcement Learning (RL), and more specifically in the famous Q-Learning method of RL. We construct a riskadjusted Markov Decision Process for a discrete-time version of the classical Black-ScholesMerton (BSM) model, where the option price is an optimal Q-function, while the optimal hedge is a second argument of this ...
متن کاملReinforcement Learning in Multi-Party Trading Dialog
In this paper, we apply reinforcement learning (RL) to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner ...
متن کاملA New Inexact Inverse Subspace Iteration for Generalized Eigenvalue Problems
In this paper, we represent an inexact inverse subspace iteration method for computing a few eigenpairs of the generalized eigenvalue problem Ax = Bx [Q. Ye and P. Zhang, Inexact inverse subspace iteration for generalized eigenvalue problems, Linear Algebra and its Application, 434 (2011) 1697-1715 ]. In particular, the linear convergence property of the inverse subspace iteration is preserved.
متن کاملOptimizing Spoken Dialogue Management from Data Corpora with Fitted Value Iteration
In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1801.06077 شماره
صفحات -
تاریخ انتشار 2018