Algorithms for CVaR Optimization in MDPs
نویسندگان
چکیده
In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.
منابع مشابه
Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach
In this paper we address the problem of decision making within a Markov de-cision process (MDP) framework where risk and modeling errors are taken intoaccount. Our approach is to minimize a risk-sensitive conditional-value-at-risk(CVaR) objective, as opposed to a standard risk-neutral expectation. We refer tosuch problem as CVaR MDP. Our first contribution is to show that a CVaR...
متن کاملPolicy Gradients for CVaR-Constrained MDPs
We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR). We propose two algorithms that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling. Both the algorithms incorporate a CVaR estimation procedure, along t...
متن کاملRisk-Constrained Reinforcement Learning with Percentile Risk Criteria
In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented v...
متن کاملمقایسه پارامتریک مرزهای کارایی مدل های مدیریت ریسک مارکویتز، ارزش در معرض ریسک و ارزش در معرض ریسک احتمالی با استفاده از الگوریتم بهینه سازی تبرید شبیه سازی شده در بورس اوراق بهادار تهران
Nowadays risk management is as vital as gaining the maximum return. Therefore, researches in risk management area and its different models are very useful for the investors. Using a local (fmincon function) and a global optimization (simulated annealing) algorithms based on three risk management models namely Markowitz, Value at Risk (VaR) and Conditional Value at Risk (CVaR), this research see...
متن کاملRobust Portfolio Optimization with risk measure CVAR under MGH distribution in DEA models
Financial returns exhibit stylized facts such as leptokurtosis, skewness and heavy-tailness. Regarding this behavior, in this paper, we apply multivariate generalized hyperbolic (mGH) distribution for portfolio modeling and performance evaluation, using conditional value at risk (CVaR) as a risk measure and allocating best weights for portfolio selection. Moreover, a robust portfolio optimizati...
متن کامل