Basis function adaptation in temporal difference reinforcement learning
From MaRDI portal
Publication:2485935
DOI10.1007/s10479-005-5732-zzbMath1075.90073OpenAlexW1998172110MaRDI QIDQ2485935
Ishai Menache, Nahum Shimkin, Shie Mannor
Publication date: 5 August 2005
Published in: Annals of Operations Research (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10479-005-5732-z
Management decision making, including multiple objectives (90B50) Markov and semi-Markov decision processes (90C40) Methods of reduced gradient type (90C52)
Related Items
Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, Approximate policy iteration: a survey and some new methods, An incremental off-policy search in a model-free Markov decision process using a single sample path, An Incremental Fast Policy Search Using a Single Sample Path, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Reinforcement learning for a biped robot based on a CPG-actor-critic method, Unnamed Item, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Model selection in reinforcement learning, A tutorial on the cross-entropy method, Approximate dynamic programming via direct search in the space of value function approximations, Projected equation methods for approximate solution of large linear systems, Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning, Learning Tetris Using the Noisy Cross-Entropy Method, Approximate dynamic programming via iterated Bellman inequalities, Actor-Critic Algorithms with Online Feature Adaptation
Cites Work
- Technical update: Least-squares temporal difference learning
- The cross-entropy method for combinatorial and continuous optimization
- A tutorial on the cross-entropy method
- Application of the cross-entropy method to the buffer allocation problem in a simulation-based environment
- An adaptive optimal controller for discrete-time Markov environments
- An analysis of temporal-difference learning with function approximation
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item