scientific article
From MaRDI portal
Publication:3093234
zbMath1222.68207MaRDI QIDQ3093234
Evan Greensmith, Bartlett, Peter L., Jonathan Baxter
Publication date: 12 October 2011
Full work available at URL: http://www.jmlr.org/papers/v5/greensmith04a.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Markov processes: estimation; hidden Markov models (62M05) Learning and adaptive systems in artificial intelligence (68T05)
Related Items
The factored policy-gradient planner ⋮ Adaptive playouts for online learning of policies during Monte Carlo tree search ⋮ Learning to control a structured-prediction decoder for detection of HTTP-layer DDoS attackers ⋮ Unnamed Item ⋮ Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization ⋮ Personalized dynamic treatment regimes in continuous time: a Bayesian approach for optimizing clinical decisions with timing ⋮ A Bayesian decision framework for optimizing sequential combination antiretroviral therapy in people with HIV ⋮ Scalable Control Variates for Monte Carlo Methods Via Stochastic Optimization ⋮ Optimised graded metamaterials for mechanical energy confinement and amplification via reinforcement learning ⋮ Analysis and improvement of policy gradient estimation ⋮ Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration ⋮ Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies ⋮ Deep Reinforcement Learning: A State-of-the-Art Walkthrough ⋮ On-line policy gradient estimation with multi-step sampling ⋮ Importance sampling in reinforcement learning with an estimated behavior policy ⋮ TD-regularized actor-critic methods ⋮ Natural actor-critic algorithms ⋮ Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients