Pages that link to "Item:Q4252717"

From MaRDI portal

← (Q4252717)

Jump to:navigation, search

What links here

⧼whatlinkshere-whatlinkshere-target⧽

Page:

⧼whatlinkshere-whatlinkshere-ns⧽

Namespace:

Invert selection

⧼whatlinkshere-whatlinkshere-filter⧽

Hide transclusions

Hide links

Hide redirects

The following pages link to (Q4252717):

Displaying 50 items.

On two continuum armed bandit problems in high dimensions (Q260274) (← links)
The territorial raider game and graph derangements (Q313781) (← links)
Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning (Q413845) (← links)
Competitive strategy for on-line leasing of depreciable equipment (Q646109) (← links)
A dynamic programming strategy to balance exploration and exploitation in the bandit problem (Q647433) (← links)
Modeling item-item similarities for personalized recommendations on Yahoo! front page (Q652346) (← links)
Learning dynamic algorithm portfolios (Q870809) (← links)
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments (Q924170) (← links)
A reinforcement learning approach to interval constraint propagation (Q941660) (← links)
A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems (Q949395) (← links)
Competitive collaborative learning (Q959897) (← links)
Exponential weight algorithm in continuous time (Q959954) (← links)
Playing monotone games to understand learning behaviors (Q974103) (← links)
Effective short-term opponent exploitation in simplified poker (Q1009317) (← links)
No regrets about no-regret (Q1028930) (← links)
On stable social laws and qualitative equilibria (Q1274287) (← links)
Sex with no regrets: how sexual reproduction uses a no regret learning algorithm for evolutionary advantage (Q1702267) (← links)
On the convergence of reinforcement learning (Q1779805) (← links)
Regret in the on-line decision problem (Q1818283) (← links)
Adaptive game playing using multiplicative weights (Q1818286) (← links)
Conditional universal consistency. (Q1818287) (← links)
Minimizing regret: The general case (Q1818295) (← links)
Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates (Q1848931) (← links)
Apple tasting. (Q1854360) (← links)
Online learning in online auctions (Q1887078) (← links)
Preference-based reinforcement learning: a formal framework and a policy iteration algorithm (Q1945130) (← links)
Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991) (Q1982402) (← links)
Gorthaur-EXP3: bandit-based selection from a portfolio of recommendation algorithms balancing the accuracy-diversity dilemma (Q2055544) (← links)
Gittins' theorem under uncertainty (Q2076662) (← links)
Learning in auctions: regret is hard, envy is easy (Q2155904) (← links)
Lipschitzness is all you need to tame off-policy generative adversarial imitation learning (Q2163202) (← links)
Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers (Q2251439) (← links)
Bounding the inefficiency of outcomes in generalized second price auctions (Q2253843) (← links)
Adaptive policies for perimeter surveillance problems (Q2286935) (← links)
Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games (Q2303656) (← links)
AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents (Q2384141) (← links)
Dynamic benchmark targeting (Q2397633) (← links)
Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems (Q2479159) (← links)
Learning Where to Attend with Deep Architectures for Image Tracking (Q2919435) (← links)
Incentivizing Exploration with Heterogeneous Value of Money (Q3460803) (← links)
(Q4558161) (← links)
(Q4637066) (← links)
Competitive On-line Statistics (Q4831997) (← links)
OPTIMUM ENERGY FOR ENERGY PACKET NETWORKS (Q4961789) (← links)
Reinforcement Learning Based Interactive Agent for Personalized Mathematical Skill Enhancement (Q5014701) (← links)
A Bandit-Learning Approach to Multifidelity Approximation (Q5022495) (← links)
(Q5053317) (← links)
Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes (Q5060501) (← links)
(Q5381115) (← links)
Functional Sequential Treatment Allocation (Q5881136) (← links)

Retrieved from "https://portal.mardi4nfdi.de/wiki/Special:WhatLinksHere/Item:Q4252717"