Average cost temporal-difference learning (Q1805802): Difference between revisions

The authors propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are performed through linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. In addition, a proof of convergence and a characterization of the limit of convergence are presented. A bound on the resulting approximation error that exhibits an interesting dependence on ``mixing time'' of the Markov chain is provided.

0 references

reviewed by

Wang Cheng-Shu

0 references

zbMATH Keywords

dynamic programming

0 references

learning

0 references

average cost

0 references

aperiodic Markov chain

0 references

convergence

0 references

mixing time

0 references

Identifiers

zbMATH Open document ID

0932.93085

0 references

DOI

10.1016/S0005-1098(99)00099-0

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1805802

Revision as of 03:55, 14 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Removed claim: author (P16): Item:Q399882 ← Older edit	Revision as of 03:55, 14 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Changed an Item Newer edit →
	Property / author
		Benjamin van Roy
	Property / author: Benjamin van Roy / rank
		Normal rank