Reinforcement learning with dynamic convex risk measures

DOI10.1111/MAFI.12388arXiv2112.13414OpenAlexW4225482307MaRDI QIDQ6196296FDOQ6196296

Publication date: 14 March 2024

Published in: Mathematical Finance (Search for Journal in Brave)

Abstract: We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

Full work available at URL: https://arxiv.org/abs/2112.13414

Recommendations

zbMATH Keywords

time-consistency reinforcement learning actor-critic algorithm robot control dynamic risk measures trading strategies policy gradient financial hedging

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Statistical methods; risk measures (91G70) Optimal stochastic control (93E20)

Cites Work

Cited In (8)

This page was built for publication: Reinforcement learning with dynamic convex risk measures

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6196296)