scientific article
From MaRDI portal
Publication:2834459
zbMath1392.68345MaRDI QIDQ2834459
Amir-massoud Farahmand, Csaba Szepesvári, Mohammad Ghavamzadeh, Shie Mannor
Publication date: 22 November 2016
Full work available at URL: http://jmlr.csail.mit.edu/papers/v17/13-016.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
regularizationreinforcement learningapproximate policy iterationfinite-sample analysisnon-parametric method
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20)
Related Items
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications ⋮ A mathematical perspective of machine learning ⋮ A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets ⋮ Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning ⋮ Projected state-action balancing weights for offline reinforcement learning ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Learning When-to-Treat Policies ⋮ Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health ⋮ Batch policy learning in average reward Markov decision processes