The following pages link to (Q5396640):
Displayed 8 items.
- Doubly robust policy evaluation and optimization (Q252797) (← links)
- Extracting certainty from uncertainty: regret bounded by variation in costs (Q1959595) (← links)
- Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm (Q2091834) (← links)
- Truthful Mechanisms with Implicit Payment Computation (Q2796397) (← links)
- (Q4998863) (← links)
- Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards (Q5113912) (← links)
- AN ONLINE PORTFOLIO SELECTION ALGORITHM WITH REGRET LOGARITHMIC IN PRICE VARIATION (Q5247422) (← links)
- Relaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularization (Q6183761) (← links)