Average cost temporal-difference learning (Q1805802): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
Import240304020342 (talk | contribs)
Set profile property.
 
(2 intermediate revisions by one other user not shown)
Property / author
 
Property / author: Benjamin van Roy / rank
Normal rank
 
Property / author
 
Property / author: Benjamin van Roy / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank

Latest revision as of 04:45, 5 March 2024

scientific article
Language Label Description Also known as
English
Average cost temporal-difference learning
scientific article

    Statements

    Average cost temporal-difference learning (English)
    0 references
    0 references
    0 references
    0 references
    28 February 2000
    0 references
    The authors propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are performed through linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. In addition, a proof of convergence and a characterization of the limit of convergence are presented. A bound on the resulting approximation error that exhibits an interesting dependence on ``mixing time'' of the Markov chain is provided.
    0 references
    dynamic programming
    0 references
    learning
    0 references
    average cost
    0 references
    aperiodic Markov chain
    0 references
    convergence
    0 references
    mixing time
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references