The following pages link to (Q4252717):
Displaying 50 items.
- On two continuum armed bandit problems in high dimensions (Q260274) (← links)
- The territorial raider game and graph derangements (Q313781) (← links)
- Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning (Q413845) (← links)
- Competitive strategy for on-line leasing of depreciable equipment (Q646109) (← links)
- A dynamic programming strategy to balance exploration and exploitation in the bandit problem (Q647433) (← links)
- Modeling item-item similarities for personalized recommendations on Yahoo! front page (Q652346) (← links)
- Learning dynamic algorithm portfolios (Q870809) (← links)
- Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments (Q924170) (← links)
- A reinforcement learning approach to interval constraint propagation (Q941660) (← links)
- A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems (Q949395) (← links)
- Competitive collaborative learning (Q959897) (← links)
- Exponential weight algorithm in continuous time (Q959954) (← links)
- Playing monotone games to understand learning behaviors (Q974103) (← links)
- Effective short-term opponent exploitation in simplified poker (Q1009317) (← links)
- No regrets about no-regret (Q1028930) (← links)
- On stable social laws and qualitative equilibria (Q1274287) (← links)
- Sex with no regrets: how sexual reproduction uses a no regret learning algorithm for evolutionary advantage (Q1702267) (← links)
- On the convergence of reinforcement learning (Q1779805) (← links)
- Regret in the on-line decision problem (Q1818283) (← links)
- Adaptive game playing using multiplicative weights (Q1818286) (← links)
- Conditional universal consistency. (Q1818287) (← links)
- Minimizing regret: The general case (Q1818295) (← links)
- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates (Q1848931) (← links)
- Apple tasting. (Q1854360) (← links)
- Online learning in online auctions (Q1887078) (← links)
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm (Q1945130) (← links)
- Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991) (Q1982402) (← links)
- Gorthaur-EXP3: bandit-based selection from a portfolio of recommendation algorithms balancing the accuracy-diversity dilemma (Q2055544) (← links)
- Gittins' theorem under uncertainty (Q2076662) (← links)
- Learning in auctions: regret is hard, envy is easy (Q2155904) (← links)
- Lipschitzness is all you need to tame off-policy generative adversarial imitation learning (Q2163202) (← links)
- Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers (Q2251439) (← links)
- Bounding the inefficiency of outcomes in generalized second price auctions (Q2253843) (← links)
- Adaptive policies for perimeter surveillance problems (Q2286935) (← links)
- Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games (Q2303656) (← links)
- AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents (Q2384141) (← links)
- Dynamic benchmark targeting (Q2397633) (← links)
- Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems (Q2479159) (← links)
- Learning Where to Attend with Deep Architectures for Image Tracking (Q2919435) (← links)
- Incentivizing Exploration with Heterogeneous Value of Money (Q3460803) (← links)
- (Q4558161) (← links)
- (Q4637066) (← links)
- Competitive On-line Statistics (Q4831997) (← links)
- OPTIMUM ENERGY FOR ENERGY PACKET NETWORKS (Q4961789) (← links)
- Reinforcement Learning Based Interactive Agent for Personalized Mathematical Skill Enhancement (Q5014701) (← links)
- A Bandit-Learning Approach to Multifidelity Approximation (Q5022495) (← links)
- (Q5053317) (← links)
- Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes (Q5060501) (← links)
- (Q5381115) (← links)
- Functional Sequential Treatment Allocation (Q5881136) (← links)