On-line policy gradient estimation with multi-step sampling (Q5962027)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: On-line policy gradient estimation with multi-step sampling |
scientific article; zbMATH DE number 5786411
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | On-line policy gradient estimation with multi-step sampling |
scientific article; zbMATH DE number 5786411 |
Statements
On-line policy gradient estimation with multi-step sampling (English)
0 references
16 September 2010
0 references
The authors discuss the problem of sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. Examples are given to illustrate that the existing on-line policy gradient approaches cannot provide an accurate gradient estimate when the assumption does not hold. It is shown that this assumption can be relaxed and a few new algorithms are proposed based on multi-step sampling. These algorithms do not require this assumption. All the algorithms can be implemented on sample paths and policy gradients can be estimated on-line.
0 references
Markov reward processes
0 references
on-line estimation
0 references
performance potentials
0 references
0.7623757123947144
0 references
0.7600080966949463
0 references
0.7480514645576477
0 references