Pages that link to "Item:Q3093948"
From MaRDI portal
The following pages link to On Upper-Confidence Bound Policies for Switching Bandit Problems (Q3093948):
Displayed 11 items.
- Tracking the market: dynamic pricing and learning in a changing environment (Q320123) (← links)
- Context tree selection: a unifying view (Q719769) (← links)
- Improving multi-armed bandit algorithms in online pricing settings (Q1644914) (← links)
- Order scoring, bandit learning and order cancellations (Q2115951) (← links)
- Lipschitzness is all you need to tame off-policy generative adversarial imitation learning (Q2163202) (← links)
- Learning the distribution with largest mean: two bandit frameworks (Q4606431) (← links)
- Finite-Time Analysis for the Knowledge-Gradient Policy (Q4610155) (← links)
- (Q4998863) (← links)
- (Q5053221) (← links)
- Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards (Q5113912) (← links)
- Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers (Q5880072) (← links)