An incremental off-policy search in a model-free Markov decision process using a single sample path
From MaRDI portal
Publication:1621868
DOI10.1007/s10994-018-5697-1zbMath1465.90116arXiv1801.10287OpenAlexW2963057120MaRDI QIDQ1621868
Ajin George Joseph, Shalabh Bhatnagar
Publication date: 12 November 2018
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1801.10287
global optimizationMarkov decision processcontrol problemcross entropy methodlinear function approximationstochastic approximation methodODE methodoff-policy prediction
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Model-based search for combinatorial optimization: A critical survey
- The cross-entropy method for continuous multi-extremal optimization
- Natural actor-critic algorithms
- On diagonal dominance arguments for bounding \(\| A^{-1}\|_\infty\)
- A note on entrywise perturbation theory for Markov chains
- Simulation-based algorithms for Markov decision processes
- The cross-entropy method for combinatorial and continuous optimization
- Application of the cross-entropy method to the buffer allocation problem in a simulation-based environment
- Basis function adaptation in temporal difference reinforcement learning
- Policy Iteration Based on Stochastic Factorization
- Importance Sampling for Stochastic Simulations
- A Model Reference Adaptive Search Method for Global Optimization
- Learning control of finite Markov chains with an explicit trade-off between estimation and control
- Adaptive aggregation methods for infinite horizon dynamic programming
- Optimal adaptive controllers for unknown Markov chains
- Learning control of finite Markov chains with unknown transition probabilities
- Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- Acceleration of Stochastic Approximation by Averaging
- An analysis of temporal-difference learning with function approximation
- OnActor-Critic Algorithms
- Cross-entropy and rare events for maximal cut and partition problems
- 10.1162/1532443041827907
- Least Squares Temporal Difference Methods: An Analysis under General Conditions
- Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
- A Stochastic Approximation Framework for a Class of Randomized Optimization Algorithms
- Parameter Estimation for ODEs Using a Cross-Entropy Approach
- Handbook of Markov decision processes. Methods and applications
This page was built for publication: An incremental off-policy search in a model-free Markov decision process using a single sample path