The following pages link to (Q5148951):
Displaying 8 items.
- Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics (Q2112808) (← links)
- Batch policy learning in average reward Markov decision processes (Q2112817) (← links)
- (Q5054662) (← links)
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning (Q5060503) (← links)
- Off-policy evaluation in partially observed Markov decision processes under sequential ignorability (Q6183750) (← links)
- Projected state-action balancing weights for offline reinforcement learning (Q6183753) (← links)
- Settling the sample complexity of model-based offline reinforcement learning (Q6192326) (← links)
- Optimal policy evaluation using kernel-based temporal difference methods (Q6656605) (← links)