Choquet Regularization for Continuous-Time Reinforcement Learning
From MaRDI portal
Publication:6073554
DOI10.1137/22M1524734arXiv2208.08497OpenAlexW4386750089MaRDI QIDQ6073554FDOQ6073554
Ruodu Wang, Xun Yu Zhou, Xia Han
Publication date: 11 October 2023
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Abstract: We propose emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as -greedy, exponential, uniform and Gaussian.
Full work available at URL: https://arxiv.org/abs/2208.08497
quantilecontinuous timereinforcement learningexplorationlinear-quadratic controlHJB equationsChoquet integralsregularizers
Cites Work
- Coherent measures of risk
- The Dual Theory of Choice under Risk
- Maxmin expected utility with non-unique prior
- Non-additive measure and integral
- Generalized deviations in risk analysis
- Subjective Probability and Expected Utility without Additivity
- Title not available (Why is that?)
- Advances in prospect theory: cumulative representation of uncertainty
- Convex measures of risk and trading constraints
- Cumulative Residual Entropy: A New Measure of Information
- Axiomatic characterization of insurance prices
- Quantile based entropy function
- Title not available (Why is that?)
- Maximum Entropy Principle with General Deviation Measures
- Stochastic finance. An introduction in discrete time.
- Title not available (Why is that?)
- Convex risk functionals: representation and applications
- Nonmonotonic Choquet integrals
- Variance Formulas for the Mean Difference and Coefficient of Concentration
- Ambiguity in portfolio selection
- Linear-quadratic approximation of optimal policy problems
- Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations
- Some properties of the cumulative residual entropy of coherent and mixed systems
- Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system
- DISTORTION RISKMETRICS ON GENERAL SPACES
- Are law-invariant risk functions concave on distributions?
- Title not available (Why is that?)
- Characterization, Robustness, and Aggregation of Signed Choquet Integrals
- Continuous‐time mean–variance portfolio selection: A reinforcement learning framework
- Parametric measures of variability induced by risk measures
- On a family of coherent measures of variability
- Entropy Regularization for Mean Field Games with Learning
- Exploratory HJB Equations and Their Convergence
- State-Dependent Temperature Control for Langevin Diffusions
- Dual Moments and Risk Attitudes
Cited In (3)
This page was built for publication: Choquet Regularization for Continuous-Time Reinforcement Learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6073554)