scientific article; zbMATH DE number 5037120

From MaRDI portal
Revision as of 02:59, 7 March 2024 by Import240305080351 (talk | contribs) (Created automatically from import240305080351)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:5477859

DOI10.1023/A:1018056104778zbMath1099.93534OpenAlexW4246906609WikidataQ56095426 ScholiaQ56095426MaRDI QIDQ5477859

Steven J. Bradtke, Andrew G. Barto

Publication date: 29 June 2006

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1023/a:1018056104778

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.



Related Items (37)

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storageApproximate policy iteration: a survey and some new methodsA review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applicationsPotential-based least-squares policy iteration for a parameterized feedback control systemAn online prediction algorithm for reinforcement learning with linear function approximation using cross entropy methodApproximate dynamic programming for the dispatch of military medical evacuation assetsReinforcement learning for a biped robot based on a CPG-actor-critic methodPerspectives of approximate dynamic programmingRestricted gradient-descent algorithm for value-function approximation in reinforcement learningA generalized Kalman filter for fixed point approximation and efficient temporal-difference learningUnnamed ItemDynamic portfolio choice: a simulation-and-regression approachApproximate dynamic programming for the military inventory routing problemBatch mode reinforcement learning based on the synthesis of artificial trajectoriesThe optimal unbiased value estimator and its relation to LSTD, TD and MCRecent advances in reinforcement learning in financeReinforcement learning algorithms with function approximation: recent advances and applicationsAsymptotic analysis of value prediction by well-specified and misspecified modelsHybrid least-squares algorithms for approximate policy evaluationDopamine Ramps Are a Consequence of Reward Prediction ErrorsA Q-learning predictive control scheme with guaranteed stabilityA two-level optimization model for elective surgery scheduling with downstream capacity constraintsQ-learning for continuous-time linear systems: A model-free infinite horizon optimal control approachApproximate dynamic programming for missile defense interceptor fire controlBasis function adaptation in temporal difference reinforcement learningProximal algorithms and temporal difference methods for solving fixed point problemsConvergence of the standard RLS method andUDUTfactorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programmingOff-policy temporal difference learning with distribution adaptation in fast mixing chainsLearning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample pathAn approximate dynamic programming approach for comparing firing policies in a networked air defense environmentProjected equation methods for approximate solution of large linear systemsConcentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform samplingChallenges of real-world reinforcement learning: definitions, benchmarks and analysisNatural actor-critic algorithmsImproving defensive air battle management by solving a stochastic dynamic assignment problem via approximate dynamic programmingUnnamed ItemSolving factored MDPs using non-homogeneous partitions







This page was built for publication: