scientific article; zbMATH DE number 1753153
From MaRDI portal
Publication:4533363
zbMath0994.68187MaRDI QIDQ4533363
Lex Weaver, Jonathan Baxter, Bartlett, Peter L.
Publication date: 10 October 2002
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Nonnumerical algorithms (68W05) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20)
Related Items
The factored policy-gradient planner, Active inference and agency: optimal control without cost functions, Finding optimal memoryless policies of POMDPs under the expected average reward criterion, Analysis and improvement of policy gradient estimation, Structured prediction with reinforcement learning, Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies, ARES: Adaptive Receding-Horizon Synthesis of Optimal Plans, A tutorial on the cross-entropy method, Basic ideas for event-based optimization of Markov systems, On-line policy gradient estimation with multi-step sampling, ARMed SPHINCS, Does lifelong learning affect mobile robot evolution?