The following pages link to (Q2896090):
Displayed 14 items.
- Extreme state aggregation beyond Markov decision processes (Q329613) (← links)
- Adaptive aggregation for reinforcement learning in average reward Markov decision processes (Q378753) (← links)
- Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model (Q399890) (← links)
- Regret bounds for restless Markov bandits (Q465253) (← links)
- Near-optimal PAC bounds for discounted MDPs (Q465258) (← links)
- Reducing reinforcement learning to KWIK online regression (Q616761) (← links)
- Online regret bounds for Markov decision processes with deterministic transitions (Q982638) (← links)
- Scale-free online learning (Q1704560) (← links)
- Reinforcement Learning in Robust Markov Decision Processes (Q2833106) (← links)
- Robust MDPs with <i>k</i>-Rectangular Uncertainty (Q2833114) (← links)
- Scale-Free Algorithms for Online Linear Optimization (Q2835636) (← links)
- Online Learning in Markov Decision Processes with Continuous Actions (Q2835638) (← links)
- Learning the distribution with largest mean: two bandit frameworks (Q4606431) (← links)
- Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution (Q5244873) (← links)