Average cost temporal-difference learning (Q1805802): Difference between revisions
From MaRDI portal
Added link to MaRDI item. |
Removed claim: author (P16): Item:Q399882 |
||
Property / author | |||
Property / author: Benjamin van Roy / rank | |||
Revision as of 03:55, 14 February 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Average cost temporal-difference learning |
scientific article |
Statements
Average cost temporal-difference learning (English)
0 references
28 February 2000
0 references
The authors propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are performed through linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. In addition, a proof of convergence and a characterization of the limit of convergence are presented. A bound on the resulting approximation error that exhibits an interesting dependence on ``mixing time'' of the Markov chain is provided.
0 references
dynamic programming
0 references
learning
0 references
average cost
0 references
aperiodic Markov chain
0 references
convergence
0 references
mixing time
0 references