Algorithms for reinforcement learning.
DOI10.2200/S00268ED1V01Y201005AIM009zbMATH Open1205.68320OpenAlexW4211221179MaRDI QIDQ3588852FDOQ3588852
Authors: Csaba Szepesvári
Publication date: 10 September 2010
Published in: Synthesis Lectures on Artificial Intelligence and Machine Learning (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.2200/s00268ed1v01y201005aim009
Recommendations
planningsimulationstochastic approximationleast-squares methodsMarkov decision processesonline learningreinforcement learningfunction approximationQ-learningPAC-learningactive learningnatural gradientbias-variance tradeoffoverfittingstochastic gradient methodspolicy gradienttemporal difference learningactor-critic methods
Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to computer science (68-02) Dynamic programming (90C39) Nonnumerical algorithms (68W05) Markov and semi-Markov decision processes (90C40)
Cited In (55)
- Undiscounted reinforcement learning algorithm based on performance potentials
- A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
- Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
- Efficient augmentation and relaxation learning for individualized treatment rules using observational data
- Investigating the properties of neural network representations in reinforcement learning
- On learning and branching: a survey
- Closed-form Approximations in Multi-asset Market Making
- Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
- Model selection in reinforcement learning
- Adaptive representations for reinforcement learning.
- Statistical reinforcement learning. Modern machine learning approaches
- Dynamic treatment regimes: technical challenges and applications
- Convergence of entropy-regularized natural policy gradient with linear function approximation
- Bayesian exploration for approximate dynamic programming
- Markov decision processes with sequential sensor measurements
- Adaptive playouts for online learning of policies during Monte Carlo tree search
- Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values
- Editorial: some recent advances in learning and adaptation for uncertain feedback control systems
- Optimal activation of halting multi‐armed bandit models
- Asymptotic analysis of value prediction by well-specified and misspecified models
- A Reinforcement Learning Neural Network for Robotic Manipulator Control
- Non-parametric policy search with limited information loss
- Structure in machine learning
- Robust adaptive dynamic programming for linear and nonlinear systems: an overview
- A systematic study on meta-heuristic approaches for solving the graph coloring problem
- Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains
- On convergence of value iteration for a class of total cost Markov decision processes
- Title not available (Why is that?)
- A convex optimization approach to dynamic programming in continuous state and action spaces
- TEXPLORE: temporal difference reinforcement learning for robots and time-constrained domains
- Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage
- Empirical \(Q\)-value iteration
- Reinforcement learning. An introduction
- Efficient model-based reinforcement learning for approximate online optimal control
- Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
- A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning
- Reinforcement learning agents
- Title not available (Why is that?)
- Deep reinforcement trading with predictable returns
- Title not available (Why is that?)
- Formalization of methods for the development of autonomous artificial intelligence systems
- Crowd computing as a cooperation problem: An evolutionary approach
- Title not available (Why is that?)
- Online spatio-temporal matching in stochastic and dynamic domains
- Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search
- Modern Bayesian experimental design
- Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning
- Deep exploration via randomized value functions
- Finite-time performance of distributed temporal-difference learning with linear function approximation
- Reinforcement learning theory, algorithms and its application
- Reinforcement learning algorithms with function approximation: recent advances and applications
- Fundamental design principles for reinforcement learning algorithms
- Proximal algorithms and temporal difference methods for solving fixed point problems
- Decision making under uncertainty and reinforcement learning. Theory and algorithms
- A unified framework for stochastic optimization
Uses Software
This page was built for publication: Algorithms for reinforcement learning.
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3588852)