Algorithms for Reinforcement Learning

From MaRDI portal
Publication:3588852


DOI10.2200/S00268ED1V01Y201005AIM009zbMath1205.68320OpenAlexW4211221179MaRDI QIDQ3588852

Csaba Szepesvári

Publication date: 10 September 2010

Published in: Synthesis Lectures on Artificial Intelligence and Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.2200/s00268ed1v01y201005aim009



Related Items

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic, A convex optimization approach to dynamic programming in continuous state and action spaces, Adaptive playouts for online learning of policies during Monte Carlo tree search, A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning, Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values, Closed-form Approximations in Multi-asset Market Making, Online spatio-temporal matching in stochastic and dynamic domains, Unnamed Item, Efficient model-based reinforcement learning for approximate online optimal control, Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning, Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search, A systematic study on meta-heuristic approaches for solving the graph coloring problem, On learning and branching: a survey, Robust adaptive dynamic programming for linear and nonlinear systems: an overview, Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model, Optimal activation of halting multi‐armed bandit models, Formalization of methods for the development of autonomous artificial intelligence systems, Dynamic treatment regimes: technical challenges and applications, Deep reinforcement trading with predictable returns, Model selection in reinforcement learning, Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality, Reinforcement learning algorithms with function approximation: recent advances and applications, Crowd computing as a cooperation problem: An evolutionary approach, Asymptotic analysis of value prediction by well-specified and misspecified models, Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains, A unified framework for stochastic optimization, Markov decision processes with sequential sensor measurements, A Reinforcement Learning Neural Network for Robotic Manipulator Control, Unnamed Item, Unnamed Item, Proximal algorithms and temporal difference methods for solving fixed point problems, Some recent advances in learning and adaptation for uncertain feedback control systems, Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm, Bayesian Exploration for Approximate Dynamic Programming, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes, Fundamental design principles for reinforcement learning algorithms, Empirical Q-Value Iteration, Unnamed Item, Unnamed Item


Uses Software