scientific article
From MaRDI portal
Publication:2934010
zbMath1317.68150MaRDI QIDQ2934010
Christoph Dann, Gerhard Neumann, Jan Peters
Publication date: 8 December 2014
Full work available at URL: http://jmlr.csail.mit.edu/papers/v15/dann14a.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Point estimation (62F10) Markov processes: estimation; hidden Markov models (62M05) Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to statistics (62-02) Markov and semi-Markov decision processes (90C40) Research exposition (monographs, survey articles) pertaining to computer science (68-02)
Related Items
Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic, Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation, An incremental off-policy search in a model-free Markov decision process using a single sample path, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Simple and Optimal Methods for Stochastic Variational Inequalities, II: Markovian Noise and Policy Evaluation in Reinforcement Learning, Hybrid SGD algorithms to solve stochastic composite optimization problems with application in sparse portfolio selection problems, Stochastic composition optimization of functions without Lipschitz continuous gradient, Gradient temporal-difference learning for off-policy evaluation using emphatic weightings, Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning, Multi-agent natural actor-critic reinforcement learning algorithms, Approximated multi-agent fitted Q iteration, On Generalized Bellman Equations and Temporal-Difference Learning, Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis, Off-policy temporal difference learning with distribution adaptation in fast mixing chains, MultiLevel Composite Stochastic Optimization via Nested Variance Reduction, Accelerating Stochastic Composition Optimization, A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation, Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization, Multi-agent reinforcement learning: a selective overview of theories and algorithms, Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics