Average cost temporal-difference learning (Q1805802)

From MaRDI portal





scientific article; zbMATH DE number 1355384
Language Label Description Also known as
default for all languages
No label defined
    English
    Average cost temporal-difference learning
    scientific article; zbMATH DE number 1355384

      Statements

      Average cost temporal-difference learning (English)
      0 references
      0 references
      0 references
      0 references
      28 February 2000
      0 references
      The authors propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are performed through linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. In addition, a proof of convergence and a characterization of the limit of convergence are presented. A bound on the resulting approximation error that exhibits an interesting dependence on ``mixing time'' of the Markov chain is provided.
      0 references
      0 references
      dynamic programming
      0 references
      learning
      0 references
      average cost
      0 references
      aperiodic Markov chain
      0 references
      convergence
      0 references
      mixing time
      0 references

      Identifiers

      0 references
      0 references
      0 references
      0 references
      0 references