Least squares policy evaluation algorithms with linear function approximation (Q1870310): Difference between revisions

This paper deals with policy evaluation algorithms within the framework of infinite-horizon dynamic programming problems with discounted cost. The authors consider the discrete-time stationary Markov chain with state space \(\{1,2,\dots, n\}\) and the cost vector \(J\), given by \[ J(i)= E\Biggl[\sum^\infty_{t=0} \alpha^t g(i_t, i_{t+ 1})/i_0= i\Biggr], \] \(i= 1,2,\dots,n\), where \(i_t\) denotes the state of the system at time \(t\). The authors discuss two methods to calculate \(J\) approximately, using simulation, temporal difference and linear cost function approximation. The first method is a new algorithm, including the use of least square subproblems that are solved recursively and with a diminishing stepsize. The second method is the \(\text{LSTD}(\lambda)\) algorithm, proposed by Boyan (to appear in Machine Learning). They state convergence results for these two algorithms.

0 references

zbMATH Keywords

linear function approximation

0 references

martingale

0 references

least-square methods

0 references

policy evaluation algorithms

0 references

infinite-horizon dynamic programming

0 references

discrete-time stationary Markov chain

0 references

simulation

0 references

temporal difference

0 references

stepsize

0 references

\(\text{LSTD}(\lambda)\) algorithm

0 references

convergence results

0 references

reviewed by

Makiko Nisio

0 references

MaRDI profile type

MaRDI publication profile

0 references

Identifiers

zbMATH Open document ID

1030.93061

0 references

DOI

10.1023/A:1022192903948

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1870310

Revision as of 01:45, 15 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Changed an Item ← Older edit	Latest revision as of 05:00, 5 March 2024 Import240304020342 (talk \| contribs) 4,416,906 edits Set profile property.
	Property / MaRDI profile type
		MaRDI publication profile
	Property / MaRDI profile type: MaRDI publication profile / rank
		Normal rank