Pages that link to "Item:Q5959973"
From MaRDI portal
The following pages link to Finite-time analysis of the multiarmed bandit problem (Q5959973):
Displaying 50 items.
- Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges (Q254442) (← links)
- On two continuum armed bandit problems in high dimensions (Q260274) (← links)
- General game playing with stochastic CSP (Q265712) (← links)
- Batched bandit problems (Q282463) (← links)
- Algorithms for computing strategies in two-player simultaneous move games (Q286381) (← links)
- An analysis for strength improvement of an MCTS-based program playing Chinese dark chess (Q307781) (← links)
- Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search (Q307787) (← links)
- LinUCB applied to Monte Carlo tree search (Q307792) (← links)
- Infomax strategies for an optimal balance between exploration and exploitation (Q310029) (← links)
- Using reinforcement learning to find an optimal set of features (Q316296) (← links)
- Response-adaptive designs for clinical trials: simultaneous learning from multiple patients (Q320737) (← links)
- Control problems in online advertising and benefits of randomized bidding strategies (Q328167) (← links)
- The multi-armed bandit problem with covariates (Q355096) (← links)
- Wisdom of crowds versus groupthink: learning in groups and in isolation (Q361811) (← links)
- Kullback-Leibler upper confidence bounds for optimal sequential allocation (Q366995) (← links)
- Exploration and exploitation of scratch games (Q374139) (← links)
- Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search (Q374142) (← links)
- Adaptive aggregation for reinforcement learning in average reward Markov decision processes (Q378753) (← links)
- Robustness of stochastic bandit policies (Q391739) (← links)
- An artificial bee colony algorithm for the job shop scheduling problem with random processing times (Q400942) (← links)
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem (Q415624) (← links)
- Temporal-difference search in Computer Go (Q420936) (← links)
- The \(K\)-armed dueling bandits problem (Q440003) (← links)
- Information capture and reuse strategies in Monte Carlo Tree Search, with applications to games of hidden information (Q464622) (← links)
- Regret bounds for restless Markov bandits (Q465253) (← links)
- MSO: a framework for bound-constrained black-box global optimization algorithms (Q524912) (← links)
- Sampled fictitious play for approximate dynamic programming (Q547121) (← links)
- Optimal Bayesian strategies for the infinite-armed Bernoulli bandit (Q643377) (← links)
- A dynamic programming strategy to balance exploration and exploitation in the bandit problem (Q647433) (← links)
- Analyzing bandit-based adaptive operator selection mechanisms (Q647443) (← links)
- Modeling item-item similarities for personalized recommendations on Yahoo! front page (Q652346) (← links)
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem (Q653803) (← links)
- Corruption-tolerant bandit learning (Q669323) (← links)
- Boundary crossing probabilities for general exponential families (Q722599) (← links)
- Multi-armed bandits with episode context (Q766259) (← links)
- Multi-objective simultaneous optimistic optimization (Q781163) (← links)
- An asymptotically optimal strategy for constrained multi-armed bandit problems (Q784789) (← links)
- A non-parametric solution to the multi-armed bandit problem with covariates (Q826996) (← links)
- Optimal control with learning on the fly: a toy problem (Q832436) (← links)
- Adaptive-treed bandits (Q888482) (← links)
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality (Q892592) (← links)
- Bandit-based Monte-Carlo structure learning of probabilistic logic programs (Q894703) (← links)
- A perpetual search for talents across overlapping generations: a learning process (Q898767) (← links)
- Truthful learning mechanisms for multi-slot sponsored search auctions with externalities (Q899160) (← links)
- Online regret bounds for Markov decision processes with deterministic transitions (Q982638) (← links)
- Active learning in heteroscedastic noise (Q982644) (← links)
- Response adaptive designs that incorporate switching costs and constraints (Q997275) (← links)
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits (Q1017665) (← links)
- Crowdsourcing with unsure option (Q1640566) (← links)
- Improving multi-armed bandit algorithms in online pricing settings (Q1644914) (← links)