Risk-averse approximate dynamic programming with quantile-based risk measures

DOI10.1287/MOOR.2017.0872zbMATH Open1440.90084arXiv1509.01920OpenAlexW2462780152MaRDI QIDQ5219554FDOQ5219554

Publication date: 12 March 2020

Published in: Mathematics of Operations Research (Search for Journal in Brave)

Abstract: In this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs; we call the overall objective a dynamic quantile-based risk measure (DQBRM). In particular, we consider optimizing dynamic risk measures where the one-step risk measures are QBRMs, a class of risk measures that includes the popular value at risk (VaR) and the conditional value at risk (CVaR). Although there is considerable theoretical development of risk-averse MDPs in the literature, the computational challenges have not been explored as thoroughly. We propose data-driven and simulation-based approximate dynamic programming (ADP) algorithms to solve the risk-averse sequential decision problem. We address the issue of inefficient sampling for risk applications in simulated settings and present a procedure, based on importance sampling, to direct samples toward the "risky region" as the ADP algorithm progresses. Finally, we show numerical results of our algorithms in the context of an application involving risk-averse bidding for energy storage.

Full work available at URL: https://arxiv.org/abs/1509.01920

Recommendations

zbMATH Keywords

approximate dynamic programming reinforcement learning Q-learning dynamic risk measures energy trading

Mathematics Subject Classification ID

Decision theory (91B06) Stochastic approximation (62L20) Dynamic programming (90C39) Stochastic learning and adaptive control (93E35)

Cites Work

Cited In (6)

This page was built for publication: Risk-averse approximate dynamic programming with quantile-based risk measures

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5219554)