A dynamic programming strategy to balance exploration and exploitation in the bandit problem (Q647433)

scientific article; zbMATH DE number 5977712

Language	Label	Description	Also known as
default for all languages	No label defined
English	A dynamic programming strategy to balance exploration and exploitation in the bandit problem	scientific article; zbMATH DE number 5977712

Statements

instance of

scholarly article

0 references

title

A dynamic programming strategy to balance exploration and exploitation in the bandit problem (English)

0 references

0 references

0 references

Annals of Mathematics and Artificial Intelligence

0 references

publication date

23 November 2011

0 references

zbMATH Keywords

multi-armed bandit problem

0 references

greedy

0 references

estimation

0 references

describes a project that uses

bootstrap

0 references

PRMLT

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1007/s10472-010-9190-1

0 references

cites work

Finite-time analysis of the multiarmed bandit problem

0 references

Q4252717

0 references

Q3795523

0 references

Q4257216

0 references

Pattern recognition and machine learning.

0 references

A dynamic programming strategy to balance exploration and exploitation in the bandit problem

0 references

Q4318617

0 references

Q4692329

0 references

The sample average approximation method for stochastic discrete optimization

0 references

Exploration of multi-state environments: Local measures and back-propagation of uncertainty

0 references

Approximate Dynamic Programming

0 references

Q4315289

0 references

Some aspects of the sequential design of experiments

0 references

Identifiers

zbMATH Open document ID

1226.68079

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

10.1007/S10472-010-9190-1

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:647433