Pages that link to "Item:Q1009248"
From MaRDI portal
The following pages link to Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path (Q1009248):
Displaying 13 items.
- Model selection in reinforcement learning (Q415618) (← links)
- Hybrid least-squares algorithms for approximate policy evaluation (Q1959511) (← links)
- Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains (Q1959632) (← links)
- Rollout sampling approximate policy iteration (Q2036256) (← links)
- Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling (Q2051259) (← links)
- Batch policy learning in average reward Markov decision processes (Q2112817) (← links)
- Policy space identification in configurable environments (Q2163245) (← links)
- Estimating optimal shared-parameter dynamic regimens with application to a multistage depression clinical trial (Q2827199) (← links)
- A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications (Q2887630) (← links)
- A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation (Q5003727) (← links)
- Deep reinforcement trading with predictable returns (Q6098411) (← links)
- Off-policy evaluation in partially observed Markov decision processes under sequential ignorability (Q6183750) (← links)
- Value iteration for streaming data on a continuous space with gradient method in an RKHS (Q6488837) (← links)