Pages that link to "Item:Q5396763"
From MaRDI portal
The following pages link to Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems (Q5396763):
Displaying 50 items.
- Approximation algorithms for stochastic combinatorial optimization problems (Q290321) (← links)
- LinUCB applied to Monte Carlo tree search (Q307792) (← links)
- Kullback-Leibler upper confidence bounds for optimal sequential allocation (Q366995) (← links)
- Regret bounds for restless Markov bandits (Q465253) (← links)
- Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex (Q510299) (← links)
- A continuous-time approach to online optimization (Q520967) (← links)
- Boundary crossing probabilities for general exponential families (Q722599) (← links)
- Response prediction for low-regret agents (Q776228) (← links)
- Optimal control with learning on the fly: a toy problem (Q832436) (← links)
- Adaptive-treed bandits (Q888482) (← links)
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality (Q892592) (← links)
- Bandit-based Monte-Carlo structure learning of probabilistic logic programs (Q894703) (← links)
- Truthful learning mechanisms for multi-slot sponsored search auctions with externalities (Q899160) (← links)
- Gradient-free two-point methods for solving stochastic nonsmooth convex optimization problems with small non-random noises (Q1616222) (← links)
- Strategic conversations under imperfect information: epistemic message exchange games (Q1630949) (← links)
- On minimaxity of follow the leader strategy in the stochastic setting (Q1663642) (← links)
- A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing (Q1690964) (← links)
- Scale-free online learning (Q1704560) (← links)
- An optimal bidimensional multi-armed bandit auction for multi-unit procurement (Q1714944) (← links)
- Learning in games with continuous action sets and unknown payoff functions (Q1717237) (← links)
- A unified framework for stochastic optimization (Q1719609) (← links)
- Exploratory distributions for convex functions (Q1737974) (← links)
- Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards (Q2006767) (← links)
- An online algorithm for the risk-aware restless bandit (Q2029383) (← links)
- Multi-armed bandit with sub-exponential rewards (Q2060366) (← links)
- A revised approach for risk-averse multi-armed bandits under CVaR criterion (Q2060576) (← links)
- On testing transitivity in online preference learning (Q2071345) (← links)
- From reinforcement learning to optimal control: a unified framework for sequential decisions (Q2094027) (← links)
- Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning (Q2094051) (← links)
- Noisy zeroth-order optimization for non-smooth saddle point problems (Q2104286) (← links)
- Order scoring, bandit learning and order cancellations (Q2115951) (← links)
- Adaptive large neighborhood search for mixed integer programming (Q2146445) (← links)
- Learning in auctions: regret is hard, envy is easy (Q2155904) (← links)
- A reliability-aware multi-armed bandit approach to learn and select users in demand response (Q2207171) (← links)
- An adversarial model for scheduling with testing (Q2211361) (← links)
- Undiscounted bandit games (Q2212738) (← links)
- Ballooning multi-armed bandits (Q2238588) (← links)
- Mistake bounds on the noise-free multi-armed bandit game (Q2280334) (← links)
- Active ranking from pairwise comparisons and when parametric assumptions do not help (Q2284367) (← links)
- Adaptive policies for perimeter surveillance problems (Q2286935) (← links)
- New bounds on the price of bandit feedback for mistake-bounded online multiclass learning (Q2290693) (← links)
- A note on a tight lower bound for capacitated MNL-bandit assortment selection models (Q2294230) (← links)
- Concentration bounds for empirical conditional value-at-risk: the unbounded case (Q2294256) (← links)
- Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games (Q2303656) (← links)
- Meta-inductive prediction based on attractivity weighting: mathematical and empirical performance evaluation (Q2332810) (← links)
- On the efficiency of a randomized mirror descent algorithm in online optimization problems (Q2354481) (← links)
- Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case (Q2397263) (← links)
- Multi-armed bandits based on a variant of simulated annealing (Q2520136) (← links)
- Mechanisms with learning for stochastic multi-armed bandit problems (Q2520139) (← links)
- Distributed cooperative decision making in multi-agent multi-armed bandits (Q2663944) (← links)