Pages that link to "Item:Q4785631"
From MaRDI portal
The following pages link to The Nonstochastic Multiarmed Bandit Problem (Q4785631):
Displaying 50 items.
- Doubly robust policy evaluation and optimization (Q252797) (← links)
- On two continuum armed bandit problems in high dimensions (Q260274) (← links)
- Algorithms for computing strategies in two-player simultaneous move games (Q286381) (← links)
- Bandit online optimization over the permutahedron (Q329616) (← links)
- Generalized mirror descents in congestion games (Q334813) (← links)
- Exploration and exploitation of scratch games (Q374139) (← links)
- Algorithm portfolio selection as a bandit problem with unbounded losses (Q408989) (← links)
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem (Q415624) (← links)
- Combinatorial bandits (Q439986) (← links)
- Learning with stochastic inputs and adversarial outputs (Q439998) (← links)
- The \(K\)-armed dueling bandits problem (Q440003) (← links)
- Regret bounds for restless Markov bandits (Q465253) (← links)
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem (Q653803) (← links)
- Corruption-tolerant bandit learning (Q669323) (← links)
- Multi-armed bandits with episode context (Q766259) (← links)
- Replicator dynamics: old and new (Q828036) (← links)
- Learning dynamic algorithm portfolios (Q870809) (← links)
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality (Q892592) (← links)
- A perpetual search for talents across overlapping generations: a learning process (Q898767) (← links)
- Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments (Q924170) (← links)
- Regret minimization in repeated matrix games with variable stage duration (Q926893) (← links)
- A reinforcement learning approach to interval constraint propagation (Q941660) (← links)
- Competitive collaborative learning (Q959897) (← links)
- Exponential weight algorithm in continuous time (Q959954) (← links)
- Online regret bounds for Markov decision processes with deterministic transitions (Q982638) (← links)
- A payoff-based learning procedure and its application to traffic games (Q993787) (← links)
- Perspectives on multiagent learning (Q1028921) (← links)
- Multi-agent learning for engineers (Q1028926) (← links)
- Improving multi-armed bandit algorithms in online pricing settings (Q1644914) (← links)
- Two queues with non-stochastic arrivals (Q1667172) (← links)
- Randomized prediction of individual sequences (Q1733293) (← links)
- Selective harvesting over networks (Q1741374) (← links)
- Online learning in online auctions (Q1887078) (← links)
- Online multiple kernel classification (Q1945032) (← links)
- Extracting certainty from uncertainty: regret bounded by variation in costs (Q1959595) (← links)
- Regret bounds for sleeping experts and bandits (Q1959599) (← links)
- Bayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacks (Q2021298) (← links)
- Gorthaur-EXP3: bandit-based selection from a portfolio of recommendation algorithms balancing the accuracy-diversity dilemma (Q2055544) (← links)
- Gittins' theorem under uncertainty (Q2076662) (← links)
- Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit (Q2081727) (← links)
- Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm (Q2091834) (← links)
- Multi-agent reinforcement learning: a selective overview of theories and algorithms (Q2094040) (← links)
- Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning (Q2094051) (← links)
- Robust control of the multi-armed bandit problem (Q2095215) (← links)
- MedleySolver: online SMT algorithm selection (Q2118336) (← links)
- Adaptive large neighborhood search for mixed integer programming (Q2146445) (← links)
- Dynamic pricing with finite price sets: a non-parametric approach (Q2238754) (← links)
- Filtered Poisson process bandit on a continuum (Q2239901) (← links)
- Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers (Q2251439) (← links)
- BoostingTree: parallel selection of weak learners in boosting, with application to ranking (Q2251442) (← links)