Pages that link to "Item:Q1017665"

From MaRDI portal

← Exploration-exploitation tradeoff using variance estimates in multi-armed bandits (Q1017665)

Jump to:navigation, search

The following pages link to Exploration-exploitation tradeoff using variance estimates in multi-armed bandits (Q1017665):

Displayed 38 items.

Optimal learning with Bernstein Online Aggregation (Q72768) ‎ (← links)
Kullback-Leibler upper confidence bounds for optimal sequential allocation (Q366995) ‎ (← links)
Exploration and exploitation of scratch games (Q374139) ‎ (← links)
Robustness of stochastic bandit policies (Q391739) ‎ (← links)
UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem (Q653803) ‎ (← links)
Corruption-tolerant bandit learning (Q669323) ‎ (← links)
Boundary crossing probabilities for general exponential families (Q722599) ‎ (← links)
Multi-armed bandits with episode context (Q766259) ‎ (← links)
Improving multi-armed bandit algorithms in online pricing settings (Q1644914) ‎ (← links)
An adaptive and robust biological network based on the vacant-particle transportation model (Q1670644) ‎ (← links)
On Bayesian index policies for sequential resource allocation (Q1750289) ‎ (← links)
Time-uniform, nonparametric, nonasymptotic confidence sequences (Q2039804) ‎ (← links)
Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit (Q2081727) ‎ (← links)
A PAC algorithm in relative precision for bandit problem with costly sampling (Q2084297) ‎ (← links)
Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning (Q2094051) ‎ (← links)
Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers (Q2251439) ‎ (← links)
Mixing time estimation in reversible Markov chains from a single sample path (Q2330466) ‎ (← links)
Bayesian optimistic Kullback-Leibler exploration (Q2425228) ‎ (← links)
Constructing effective personalized policies using counterfactual inference from biased data sets with many features (Q2425241) ‎ (← links)
Pure exploration in finitely-armed and continuous-armed bandits (Q2431430) ‎ (← links)
Concentration inequalities for sampling without replacement (Q2515502) ‎ (← links)
Primal-Dual Algorithms for Optimization with Stochastic Dominance (Q2954172) ‎ (← links)
(Q4558206) ‎ (← links)
(Q4558474) ‎ (← links)
Adaptive Sampling Strategies for Stochastic Optimization (Q4562248) ‎ (← links)
Finite-Time Analysis for the Knowledge-Gradient Policy (Q4610155) ‎ (← links)
Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach (Q4994160) ‎ (← links)
(Q4998881) ‎ (← links)
(Q5053221) ‎ (← links)
Nonasymptotic Analysis of Monte Carlo Tree Search (Q5060499) ‎ (← links)
Setting Reserve Prices in Second-Price Auctions with Unobserved Bids (Q5060778) ‎ (← links)
EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET (Q5070864) ‎ (← links)
Data-Driven Decisions for Problems with an Unspecified Objective Function (Q5137432) ‎ (← links)
ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT (Q5358116) ‎ (← links)
Functional Sequential Treatment Allocation (Q5881136) ‎ (← links)
Robust supervised learning with coordinate gradient descent (Q6172182) ‎ (← links)
Multi-armed linear bandits with latent biases (Q6198758) ‎ (← links)
A confirmation of a conjecture on Feldman’s two-armed bandit problem (Q6198964) ‎ (← links)

Retrieved from "https://portal.mardi4nfdi.de/wiki/Special:WhatLinksHere/Item:Q1017665"