On ergodic two-armed bandits (Q417067)
From MaRDI portal
!
WARNING
This is the item page for this Wikibase entity, intended for internal use and editing purposes.
Please use the normal view instead:
scientific article; zbMATH DE number 6034161
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | On ergodic two-armed bandits |
scientific article; zbMATH DE number 6034161 |
Statements
On ergodic two-armed bandits (English)
0 references
13 May 2012
0 references
This paper focuses on the Narenda two-armed bandit algorithm under assumption that the payoff sequences are unknown and deterministic. The authors consider the Narenda algorithm under the conditions required by \textit{D. Lamberton, G. Pagès} and \textit{P. Tarrès} [Ann. Appl. Probab. 14, No. 3, 1424--1454 (2004; Zbl 1048.62079)] without monotonicity and under weak ergodic assumptions. The obtained results point out that, even with strongly dependent outcomes, the payoff probability accumulates statistical information on the ergodic behaviour of the two arms to induce a corresponding appropriate decision.
0 references
convergence
0 references
stochastic algorithms
0 references
0.8933675
0 references
0.8932164
0 references
0.8900617
0 references
0.8827189
0 references
0.88181907
0 references
0.8817457
0 references
0.87988853
0 references