The following pages link to (Q5148951):
Displayed 8 items.
- Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics (Q2112808) (← links)
- Batch policy learning in average reward Markov decision processes (Q2112817) (← links)
- (Q5054662) (← links)
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning (Q5060503) (← links)
- Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons (Q6153987) (← links)
- Off-policy evaluation in partially observed Markov decision processes under sequential ignorability (Q6183750) (← links)
- Projected state-action balancing weights for offline reinforcement learning (Q6183753) (← links)
- Settling the sample complexity of model-based offline reinforcement learning (Q6192326) (← links)