scientific article; zbMATH DE number 1753153

From MaRDI portal

Publication:4533363

Jump to:navigation, search

zbMath0994.68187MaRDI QIDQ4533363

Lex Weaver, Jonathan Baxter, Bartlett, Peter L.

Publication date: 10 October 2002

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

observable Markov decision process

Mathematics Subject Classification ID

Nonnumerical algorithms (68W05) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20)

Related Items

The factored policy-gradient planner, Active inference and agency: optimal control without cost functions, Finding optimal memoryless policies of POMDPs under the expected average reward criterion, Analysis and improvement of policy gradient estimation, Structured prediction with reinforcement learning, Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies, ARES: Adaptive Receding-Horizon Synthesis of Optimal Plans, A tutorial on the cross-entropy method, Basic ideas for event-based optimization of Markov systems, On-line policy gradient estimation with multi-step sampling, ARMed SPHINCS, Does lifelong learning affect mobile robot evolution?

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4533363&oldid=18647737"