scientific article; zbMATH DE number 5037120

DOI10.1023/A:1018056104778zbMath1099.93534OpenAlexW4246906609WikidataQ56095426 ScholiaQ56095426MaRDI QIDQ5477859

Publication date: 29 June 2006

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1023/a:1018056104778

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

Reinforcement learning Markov Decision Problems Least-Squares Temporal Difference Methods

Mathematics Subject Classification ID

Least squares and related methods for stochastic control systems (93E24) Stochastic learning and adaptive control (93E35) Markov and semi-Markov decision processes (90C40)

Related Items (37)

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage ⋮ Approximate policy iteration: a survey and some new methods ⋮ A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications ⋮ Potential-based least-squares policy iteration for a parameterized feedback control system ⋮ An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method ⋮ Approximate dynamic programming for the dispatch of military medical evacuation assets ⋮ Reinforcement learning for a biped robot based on a CPG-actor-critic method ⋮ Perspectives of approximate dynamic programming ⋮ Restricted gradient-descent algorithm for value-function approximation in reinforcement learning ⋮ A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning ⋮ Unnamed Item ⋮ Dynamic portfolio choice: a simulation-and-regression approach ⋮ Approximate dynamic programming for the military inventory routing problem ⋮ Batch mode reinforcement learning based on the synthesis of artificial trajectories ⋮ The optimal unbiased value estimator and its relation to LSTD, TD and MC ⋮ Recent advances in reinforcement learning in finance ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Asymptotic analysis of value prediction by well-specified and misspecified models ⋮ Hybrid least-squares algorithms for approximate policy evaluation ⋮ Dopamine Ramps Are a Consequence of Reward Prediction Errors ⋮ A Q-learning predictive control scheme with guaranteed stability ⋮ A two-level optimization model for elective surgery scheduling with downstream capacity constraints ⋮ Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach ⋮ Approximate dynamic programming for missile defense interceptor fire control ⋮ Basis function adaptation in temporal difference reinforcement learning ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Convergence of the standard RLS method andUDU^Tfactorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming ⋮ Off-policy temporal difference learning with distribution adaptation in fast mixing chains ⋮ Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path ⋮ An approximate dynamic programming approach for comparing firing policies in a networked air defense environment ⋮ Projected equation methods for approximate solution of large linear systems ⋮ Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling ⋮ Challenges of real-world reinforcement learning: definitions, benchmarks and analysis ⋮ Natural actor-critic algorithms ⋮ Improving defensive air battle management by solving a stochastic dynamic assignment problem via approximate dynamic programming ⋮ Unnamed Item ⋮ Solving factored MDPs using non-homogeneous partitions

This page was built for publication: