The variational calculus and approximation in policy space for Markovian decision processes (Q1068009)

!

WARNING

This is the item page for this Wikibase entity, intended for internal use and editing purposes.

Please use the normal view instead:

The variational calculus and approximation in policy space for Markovian decision processes

scientific article; zbMATH DE number 3928751

Language	Label	Description	Also known as
default for all languages	No label defined
English	The variational calculus and approximation in policy space for Markovian decision processes	scientific article; zbMATH DE number 3928751

Statements

instance of

scholarly article

0 references

title

The variational calculus and approximation in policy space for Markovian decision processes (English)

0 references

author

Paul J. Schweitzer

0 references

published in

Journal of Mathematical Analysis and Applications

0 references

publication date

1985

0 references

review text

The functional equations of Markovian decision processes yield the state values (and gain rate in the undiscounted case). Variational expressions are exhibited here for these state values (and gain rate); these expressions are stationary when evaluated at the correct values. When guesses for the values (and gain rate) are inserted into these variational expressions, a superior guess is usually obtained. Repetition of this procedure is shown to be equivalent to the method of successive approximations in policy space. Two other unusual features of this procedure are these: when the linear equations determining the Lagrange multipliers are non-singular, the variational expressions for the state variables are precisely one Newton-Raphson iteration; when applied to a linear objective function and piecewise-linear constraints, which arises for the functional equations of Markovian decision processes, the variational test quantity is piecewise constant, i.e., its first variation and higher variations all vanish. The latter explains its good performance (one-step convergence) if good estimates are available.

0 references

zbMATH Keywords

variational calculus

0 references

state values

0 references

gain rate

0 references

successive approximations in policy space

0 references

MaRDI profile type

MaRDI publication profile