The following pages link to (Q5405216):
Displaying 7 items.
- Batch mode reinforcement learning based on the synthesis of artificial trajectories (Q378762) (← links)
- Offline reinforcement learning with task hierarchies (Q1698854) (← links)
- Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling (Q2051259) (← links)
- Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics (Q2112808) (← links)
- Policy space identification in configurable environments (Q2163245) (← links)
- A Q-learning predictive control scheme with guaranteed stability (Q2220029) (← links)
- A concentration bound for \(\operatorname{LSPE}( \lambda )\) (Q2677709) (← links)