On ergodic two-armed bandits (Q417067): Difference between revisions

This paper focuses on the Narenda two-armed bandit algorithm under assumption that the payoff sequences are unknown and deterministic. The authors consider the Narenda algorithm under the conditions required by \textit{D. Lamberton, G. Pagès} and \textit{P. Tarrès} [Ann. Appl. Probab. 14, No. 3, 1424--1454 (2004; Zbl 1048.62079)] without monotonicity and under weak ergodic assumptions. The obtained results point out that, even with strongly dependent outcomes, the payoff probability accumulates statistical information on the ergodic behaviour of the two arms to induce a corresponding appropriate decision.

0 references

reviewed by

Krzysztof Piasecki

0 references

zbMATH Keywords

convergence

0 references

stochastic algorithms

0 references

Identifiers

zbMATH Open document ID

1275.62056

0 references

DOI

10.1214/10-AAP751

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:417067

Revision as of 04:43, 30 January 2024 Import240129110155 (talk \| contribs) 399,160 edits Added link to MaRDI item. ← Older edit		Revision as of 16:10, 14 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Removed claim: author (P16): Item:Q417066 Newer edit →
Property / author
	~~Pierre Vandekerkhove~~
Property / author: Pierre Vandekerkhove / rank
	~~Normal rank~~