Least squares policy evaluation algorithms with linear function approximation (Q1870310): Difference between revisions

From MaRDI portal
RedirectionBot (talk | contribs)
Removed claims
RedirectionBot (talk | contribs)
Changed an Item
Property / author
 
Property / author: Angelia Nedić / rank
 
Normal rank
Property / reviewed by
 
Property / reviewed by: Makiko Nisio / rank
 
Normal rank

Revision as of 01:45, 15 February 2024

scientific article
Language Label Description Also known as
English
Least squares policy evaluation algorithms with linear function approximation
scientific article

    Statements

    Least squares policy evaluation algorithms with linear function approximation (English)
    0 references
    0 references
    0 references
    11 May 2003
    0 references
    This paper deals with policy evaluation algorithms within the framework of infinite-horizon dynamic programming problems with discounted cost. The authors consider the discrete-time stationary Markov chain with state space \(\{1,2,\dots, n\}\) and the cost vector \(J\), given by \[ J(i)= E\Biggl[\sum^\infty_{t=0} \alpha^t g(i_t, i_{t+ 1})/i_0= i\Biggr], \] \(i= 1,2,\dots,n\), where \(i_t\) denotes the state of the system at time \(t\). The authors discuss two methods to calculate \(J\) approximately, using simulation, temporal difference and linear cost function approximation. The first method is a new algorithm, including the use of least square subproblems that are solved recursively and with a diminishing stepsize. The second method is the \(\text{LSTD}(\lambda)\) algorithm, proposed by Boyan (to appear in Machine Learning). They state convergence results for these two algorithms.
    0 references
    linear function approximation
    0 references
    martingale
    0 references
    least-square methods
    0 references
    policy evaluation algorithms
    0 references
    infinite-horizon dynamic programming
    0 references
    discrete-time stationary Markov chain
    0 references
    simulation
    0 references
    temporal difference
    0 references
    stepsize
    0 references
    \(\text{LSTD}(\lambda)\) algorithm
    0 references
    convergence results
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references