Average cost temporal-difference learning (Q1805802): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
RedirectionBot (talk | contribs)
Removed claim: author (P16): Item:Q399882
Property / author
 
Property / author: Benjamin van Roy / rank
Normal rank
 

Revision as of 04:55, 14 February 2024

scientific article
Language Label Description Also known as
English
Average cost temporal-difference learning
scientific article

    Statements

    Average cost temporal-difference learning (English)
    0 references
    0 references
    0 references
    28 February 2000
    0 references
    The authors propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are performed through linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. In addition, a proof of convergence and a characterization of the limit of convergence are presented. A bound on the resulting approximation error that exhibits an interesting dependence on ``mixing time'' of the Markov chain is provided.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    dynamic programming
    0 references
    learning
    0 references
    average cost
    0 references
    aperiodic Markov chain
    0 references
    convergence
    0 references
    mixing time
    0 references