Average cost temporal-difference learning (Q1805802)

From MaRDI portal
Revision as of 20:56, 27 July 2023 by Importer (talk | contribs) (‎Created a new Item)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
scientific article
Language Label Description Also known as
English
Average cost temporal-difference learning
scientific article

    Statements

    Average cost temporal-difference learning (English)
    0 references
    0 references
    0 references
    0 references
    28 February 2000
    0 references
    The authors propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are performed through linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. In addition, a proof of convergence and a characterization of the limit of convergence are presented. A bound on the resulting approximation error that exhibits an interesting dependence on ``mixing time'' of the Markov chain is provided.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    dynamic programming
    0 references
    learning
    0 references
    average cost
    0 references
    aperiodic Markov chain
    0 references
    convergence
    0 references
    mixing time
    0 references