Pages that link to "Item:Q5396763"

From MaRDI portal

← Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems (Q5396763)

Jump to:navigation, search

What links here

⧼whatlinkshere-whatlinkshere-target⧽

Page:

⧼whatlinkshere-whatlinkshere-ns⧽

Namespace:

Invert selection

⧼whatlinkshere-whatlinkshere-filter⧽

Hide transclusions

Hide links

Hide redirects

The following pages link to Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems (Q5396763):

Displaying 50 items.

Approximation algorithms for stochastic combinatorial optimization problems (Q290321) (← links)
LinUCB applied to Monte Carlo tree search (Q307792) (← links)
Kullback-Leibler upper confidence bounds for optimal sequential allocation (Q366995) (← links)
Regret bounds for restless Markov bandits (Q465253) (← links)
Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex (Q510299) (← links)
A continuous-time approach to online optimization (Q520967) (← links)
Boundary crossing probabilities for general exponential families (Q722599) (← links)
Response prediction for low-regret agents (Q776228) (← links)
Optimal control with learning on the fly: a toy problem (Q832436) (← links)
Adaptive-treed bandits (Q888482) (← links)
Combining multiple strategies for multiarmed bandit problems and asymptotic optimality (Q892592) (← links)
Bandit-based Monte-Carlo structure learning of probabilistic logic programs (Q894703) (← links)
Truthful learning mechanisms for multi-slot sponsored search auctions with externalities (Q899160) (← links)
Gradient-free two-point methods for solving stochastic nonsmooth convex optimization problems with small non-random noises (Q1616222) (← links)
Strategic conversations under imperfect information: epistemic message exchange games (Q1630949) (← links)
On minimaxity of follow the leader strategy in the stochastic setting (Q1663642) (← links)
A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing (Q1690964) (← links)
Scale-free online learning (Q1704560) (← links)
An optimal bidimensional multi-armed bandit auction for multi-unit procurement (Q1714944) (← links)
Learning in games with continuous action sets and unknown payoff functions (Q1717237) (← links)
A unified framework for stochastic optimization (Q1719609) (← links)
Exploratory distributions for convex functions (Q1737974) (← links)
Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards (Q2006767) (← links)
An online algorithm for the risk-aware restless bandit (Q2029383) (← links)
Multi-armed bandit with sub-exponential rewards (Q2060366) (← links)
A revised approach for risk-averse multi-armed bandits under CVaR criterion (Q2060576) (← links)
On testing transitivity in online preference learning (Q2071345) (← links)
From reinforcement learning to optimal control: a unified framework for sequential decisions (Q2094027) (← links)
Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning (Q2094051) (← links)
Noisy zeroth-order optimization for non-smooth saddle point problems (Q2104286) (← links)
Order scoring, bandit learning and order cancellations (Q2115951) (← links)
Adaptive large neighborhood search for mixed integer programming (Q2146445) (← links)
Learning in auctions: regret is hard, envy is easy (Q2155904) (← links)
A reliability-aware multi-armed bandit approach to learn and select users in demand response (Q2207171) (← links)
An adversarial model for scheduling with testing (Q2211361) (← links)
Undiscounted bandit games (Q2212738) (← links)
Ballooning multi-armed bandits (Q2238588) (← links)
Mistake bounds on the noise-free multi-armed bandit game (Q2280334) (← links)
Active ranking from pairwise comparisons and when parametric assumptions do not help (Q2284367) (← links)
Adaptive policies for perimeter surveillance problems (Q2286935) (← links)
New bounds on the price of bandit feedback for mistake-bounded online multiclass learning (Q2290693) (← links)
A note on a tight lower bound for capacitated MNL-bandit assortment selection models (Q2294230) (← links)
Concentration bounds for empirical conditional value-at-risk: the unbounded case (Q2294256) (← links)
Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games (Q2303656) (← links)
Meta-inductive prediction based on attractivity weighting: mathematical and empirical performance evaluation (Q2332810) (← links)
On the efficiency of a randomized mirror descent algorithm in online optimization problems (Q2354481) (← links)
Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case (Q2397263) (← links)
Multi-armed bandits based on a variant of simulated annealing (Q2520136) (← links)
Mechanisms with learning for stochastic multi-armed bandit problems (Q2520139) (← links)
Distributed cooperative decision making in multi-agent multi-armed bandits (Q2663944) (← links)

Retrieved from "https://portal.mardi4nfdi.de/wiki/Special:WhatLinksHere/Item:Q5396763"