Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
From MaRDI portal
Publication:6136230
Abstract: We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show that the algorithm converges to the optimality equation of a finite Markov decision process (MDP). Using this MDP model, we provide an upper bound for the approximation error for the optimal value function of the continuous time control problem. Furthermore, we present provable upper-bounds for the performance loss of the learned control process compared to the optimal admissible control process of the original problem. The provided error upper-bounds are functions of the time and space discretization parameters, and they reveal the effect of different levels of the approximation: (i) approximation of the continuous time control problem by an MDP, (ii) use of piece-wise constant control processes, (iii) space discretization. Finally, we state a time complexity bound for the proposed algorithm as a function of the time and space discretization parameters.
Recommendations
- Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach
- Convergence of discretization procedure in \(Q\)-learning
- Convergence of a Q-learning variant for continuous states and actions
- Integral \(Q\)-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems
- A Q-Learning Algorithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization
Cites work
- scientific article; zbMATH DE number 1577097 (Why is no real title available?)
- scientific article; zbMATH DE number 1804127 (Why is no real title available?)
- scientific article; zbMATH DE number 1166343 (Why is no real title available?)
- scientific article; zbMATH DE number 7625164 (Why is no real title available?)
- scientific article; zbMATH DE number 7307478 (Why is no real title available?)
- Algorithms for reinforcement learning.
- An analysis of temporal-difference learning with function approximation
- Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
- Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies.
- Asynchronous stochastic approximation and Q-learning
- Continuous‐time mean–variance portfolio selection: A reinforcement learning framework
- Error Bounds for Monotone Approximation Schemes for Hamilton--Jacobi--Bellman Equations
- Improved order 1/4 convergence for piecewise constant policy approximation of stochastic control problems
- Markov chains and stochastic stability
- Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
- Neural networks-based backward scheme for fully nonlinear PDEs
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman Equations
- On the rate of convergence of finite-difference approximations for Bellman's equations with variable coefficients
- Policy gradient in continuous time
- Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods
- Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach
- Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
- Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design
- Variational estimation of the drift for stochastic differential equations from the empirical density
- \({\mathcal Q}\)-learning
Cited in
(11)- scientific article; zbMATH DE number 2000822 (Why is no real title available?)
- Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning
- Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
- Temporal difference-based policy iteration for optimal control of stochastic systems
- Data-driven approximate Q-learning stabilization with optimality error bound analysis
- Minimax Q-learning control for linear systems using the Wasserstein metric
- A generalization error for Q-learning
- Continuity of cost in Borkar control topology and implications on discrete space and time approximations for controlled diffusions under several criteria
- A Q-learning algorithm for Markov decision processes with continuous state spaces
- Optimal learning with \textit{Q}-aggregation
- Reinforcement Learning for Linear-Convex Models with Jumps via Stability Analysis of Feedback Controls
This page was built for publication: Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6136230)