scientific article; zbMATH DE number 5037120

From MaRDI portal
Publication:5477859

DOI10.1023/A:1018056104778zbMath1099.93534OpenAlexW4246906609MaRDI QIDQ5477859

Steven J. Bradtke, Andrew G. Barto

Publication date: 29 June 2006

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1023/a:1018056104778

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.



Related Items

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, Approximate policy iteration: a survey and some new methods, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications, Potential-based least-squares policy iteration for a parameterized feedback control system, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Approximate dynamic programming for the dispatch of military medical evacuation assets, Reinforcement learning for a biped robot based on a CPG-actor-critic method, Perspectives of approximate dynamic programming, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning, Unnamed Item, Dynamic portfolio choice: a simulation-and-regression approach, Approximate dynamic programming for the military inventory routing problem, Batch mode reinforcement learning based on the synthesis of artificial trajectories, The optimal unbiased value estimator and its relation to LSTD, TD and MC, Recent advances in reinforcement learning in finance, Reinforcement learning algorithms with function approximation: recent advances and applications, Asymptotic analysis of value prediction by well-specified and misspecified models, Hybrid least-squares algorithms for approximate policy evaluation, Dopamine Ramps Are a Consequence of Reward Prediction Errors, A Q-learning predictive control scheme with guaranteed stability, A two-level optimization model for elective surgery scheduling with downstream capacity constraints, Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach, Approximate dynamic programming for missile defense interceptor fire control, Basis function adaptation in temporal difference reinforcement learning, Proximal algorithms and temporal difference methods for solving fixed point problems, Convergence of the standard RLS method andUDUTfactorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming, Off-policy temporal difference learning with distribution adaptation in fast mixing chains, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, An approximate dynamic programming approach for comparing firing policies in a networked air defense environment, Projected equation methods for approximate solution of large linear systems, Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling, Challenges of real-world reinforcement learning: definitions, benchmarks and analysis, Natural actor-critic algorithms, Improving defensive air battle management by solving a stochastic dynamic assignment problem via approximate dynamic programming, Unnamed Item, Solving factored MDPs using non-homogeneous partitions