Choquet Regularization for Continuous-Time Reinforcement Learning
From MaRDI portal
Publication:6073554
Abstract: We propose emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as -greedy, exponential, uniform and Gaussian.
Cites work
- scientific article; zbMATH DE number 1241609 (Why is no real title available?)
- scientific article; zbMATH DE number 1795842 (Why is no real title available?)
- scientific article; zbMATH DE number 1795125 (Why is no real title available?)
- scientific article; zbMATH DE number 7307478 (Why is no real title available?)
- Advances in prospect theory: cumulative representation of uncertainty
- Ambiguity in portfolio selection
- Are law-invariant risk functions concave on distributions?
- Axiomatic characterization of insurance prices
- Characterization, robustness, and aggregation of signed Choquet integrals
- Coherent measures of risk
- Continuous‐time mean–variance portfolio selection: A reinforcement learning framework
- Convex measures of risk and trading constraints
- Convex risk functionals: representation and applications
- Cumulative Residual Entropy: A New Measure of Information
- Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations
- Distortion riskmetrics on general spaces
- Dual moments and risk attitudes
- Entropy Regularization for Mean Field Games with Learning
- Exploratory HJB equations and their convergence
- Generalized deviations in risk analysis
- Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system
- Linear-quadratic approximation of optimal policy problems
- Maximum entropy principle with general deviation measures
- Maxmin expected utility with non-unique prior
- Non-additive measure and integral
- Nonmonotonic Choquet integrals
- On a family of coherent measures of variability
- Parametric measures of variability induced by risk measures
- Quantile based entropy function
- Some properties of the cumulative residual entropy of coherent and mixed systems
- State-Dependent Temperature Control for Langevin Diffusions
- Stochastic finance. An introduction in discrete time.
- Subjective Probability and Expected Utility without Additivity
- The Dual Theory of Choice under Risk
- Variance Formulas for the Mean Difference and Coefficient of Concentration
Cited in
(3)
This page was built for publication: Choquet Regularization for Continuous-Time Reinforcement Learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6073554)