Least squares policy evaluation algorithms with linear function approximation (Q1870310): Difference between revisions
From MaRDI portal
Changed an Item |
Set profile property. |
||
Property / MaRDI profile type | |||
Property / MaRDI profile type: MaRDI publication profile / rank | |||
Normal rank |
Latest revision as of 05:00, 5 March 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Least squares policy evaluation algorithms with linear function approximation |
scientific article |
Statements
Least squares policy evaluation algorithms with linear function approximation (English)
0 references
11 May 2003
0 references
This paper deals with policy evaluation algorithms within the framework of infinite-horizon dynamic programming problems with discounted cost. The authors consider the discrete-time stationary Markov chain with state space \(\{1,2,\dots, n\}\) and the cost vector \(J\), given by \[ J(i)= E\Biggl[\sum^\infty_{t=0} \alpha^t g(i_t, i_{t+ 1})/i_0= i\Biggr], \] \(i= 1,2,\dots,n\), where \(i_t\) denotes the state of the system at time \(t\). The authors discuss two methods to calculate \(J\) approximately, using simulation, temporal difference and linear cost function approximation. The first method is a new algorithm, including the use of least square subproblems that are solved recursively and with a diminishing stepsize. The second method is the \(\text{LSTD}(\lambda)\) algorithm, proposed by Boyan (to appear in Machine Learning). They state convergence results for these two algorithms.
0 references
linear function approximation
0 references
martingale
0 references
least-square methods
0 references
policy evaluation algorithms
0 references
infinite-horizon dynamic programming
0 references
discrete-time stationary Markov chain
0 references
simulation
0 references
temporal difference
0 references
stepsize
0 references
\(\text{LSTD}(\lambda)\) algorithm
0 references
convergence results
0 references