The following pages link to (Q2896090):
Displaying 37 items.
- Extreme state aggregation beyond Markov decision processes (Q329613) (← links)
- Adaptive aggregation for reinforcement learning in average reward Markov decision processes (Q378753) (← links)
- Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model (Q399890) (← links)
- Regret bounds for restless Markov bandits (Q465253) (← links)
- Near-optimal PAC bounds for discounted MDPs (Q465258) (← links)
- Reducing reinforcement learning to KWIK online regression (Q616761) (← links)
- Online regret bounds for Markov decision processes with deterministic transitions (Q982638) (← links)
- Scale-free online learning (Q1704560) (← links)
- Multi-agent reinforcement learning: a selective overview of theories and algorithms (Q2094040) (← links)
- Lipschitzness is all you need to tame off-policy generative adversarial imitation learning (Q2163202) (← links)
- Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies (Q2318167) (← links)
- Bayesian optimistic Kullback-Leibler exploration (Q2425228) (← links)
- Reinforcement Learning in Robust Markov Decision Processes (Q2833106) (← links)
- Robust MDPs with <i>k</i>-Rectangular Uncertainty (Q2833114) (← links)
- Scale-Free Algorithms for Online Linear Optimization (Q2835636) (← links)
- Online Learning in Markov Decision Processes with Continuous Actions (Q2835638) (← links)
- Learning the distribution with largest mean: two bandit frameworks (Q4606431) (← links)
- Learning to Optimize via Information-Directed Sampling (Q4969321) (← links)
- Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach (Q4994160) (← links)
- A Bandit-Learning Approach to Multifidelity Approximation (Q5022495) (← links)
- Temporal concatenation for Markov decision processes (Q5051192) (← links)
- (Q5053203) (← links)
- (Q5053310) (← links)
- Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management (Q5095166) (← links)
- Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations (Q5139834) (← links)
- Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items (Q5144768) (← links)
- (Q5148991) (← links)
- (Q5214215) (← links)
- Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution (Q5244873) (← links)
- Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time (Q6106432) (← links)
- Pessimistic value iteration for multi-task data sharing in offline reinforcement learning (Q6152665) (← links)
- Provably efficient reinforcement learning in decentralized general-sum Markov games (Q6159512) (← links)
- Settling the sample complexity of model-based offline reinforcement learning (Q6192326) (← links)
- Value iteration for streaming data on a continuous space with gradient method in an RKHS (Q6488837) (← links)
- Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization (Q6566614) (← links)
- Logarithmic regret bounds for continuous-time average-reward Markov decision processes (Q6608781) (← links)
- Learning optimal admission control in partially observable queueing networks (Q6623440) (← links)