Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

From MaRDI portal
Publication:1009248

DOI10.1007/s10994-007-5038-2zbMath1470.68072OpenAlexW2104753538MaRDI QIDQ1009248

Csaba Szepesvári, András Antos, Rémi Munos

Publication date: 31 March 2009

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1007/s10994-007-5038-2




Related Items (19)

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storageA Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-CriticA review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applicationsPolicy space identification in configurable environmentsDeep reinforcement trading with predictable returnsUnnamed ItemModel selection in reinforcement learningEstimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-LearningOff-policy evaluation in partially observed Markov decision processes under sequential ignorabilityHybrid least-squares algorithms for approximate policy evaluationAdaptive-resolution reinforcement learning with polynomial exploration in deterministic domainsRollout sampling approximate policy iterationConcentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform samplingEstimating optimal shared-parameter dynamic regimens with application to a multistage depression clinical trialUnnamed ItemA Finite Time Analysis of Temporal Difference Learning with Linear Function ApproximationLearning When-to-Treat PoliciesOff-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile HealthBatch policy learning in average reward Markov decision processes



Cites Work


This page was built for publication: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path