scientific article; zbMATH DE number 1306865

Yoav Freund, Peter Auer, Nicolò Cesa-Bianchi, Robert E. Schapire

Publication date: 26 April 2000

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

Related Items

Functional Sequential Treatment Allocation, Online learning in online auctions, The territorial raider game and graph derangements, Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes, AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents, Incentivizing Exploration with Heterogeneous Value of Money, Learning in auctions: regret is hard, envy is easy, Learning Where to Attend with Deep Architectures for Image Tracking, Lipschitzness is all you need to tame off-policy generative adversarial imitation learning, Dynamic benchmark targeting, Competitive On-line Statistics, Learning dynamic algorithm portfolios, Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems, Portfolio allocation strategy for active learning kriging-based structural reliability analysis, A unified stochastic approximation framework for learning in games, Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning, Provably efficient reinforcement learning in decentralized general-sum Markov games, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Transfer learning for contextual multi-armed bandits, A confirmation of a conjecture on Feldman’s two-armed bandit problem, Sex with no regrets: how sexual reproduction uses a no regret learning algorithm for evolutionary advantage, Competitive strategy for on-line leasing of depreciable equipment, A dynamic programming strategy to balance exploration and exploitation in the bandit problem, Unnamed Item, Modeling item-item similarities for personalized recommendations on Yahoo! front page, Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments, A reinforcement learning approach to interval constraint propagation, Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991), A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems, Unnamed Item, Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers, Competitive collaborative learning, Exponential weight algorithm in continuous time, Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems, Bounding the inefficiency of outcomes in generalized second price auctions, Learning in network contexts: experimental results from simulations, Multiagent cooperative search for portfolio selection, A general class of adaptive strategies, Playing monotone games to understand learning behaviors, Unnamed Item, On the convergence of reinforcement learning, Adaptive policies for perimeter surveillance problems, OPTIMUM ENERGY FOR ENERGY PACKET NETWORKS, Effective short-term opponent exploitation in simplified poker, Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games, Gorthaur-EXP3: bandit-based selection from a portfolio of recommendation algorithms balancing the accuracy-diversity dilemma, Unnamed Item, Regret in the on-line decision problem, Adaptive game playing using multiplicative weights, Conditional universal consistency., Minimizing regret: The general case, On stable social laws and qualitative equilibria, No regrets about no-regret, Gittins' theorem under uncertainty, Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates, Apple tasting., Unnamed Item, Reinforcement Learning Based Interactive Agent for Personalized Mathematical Skill Enhancement, A Bandit-Learning Approach to Multifidelity Approximation, On two continuum armed bandit problems in high dimensions

Uses Software

AdaBoost.MH