Action time sharing policies for ergodic control of Markov chains (Q2884591)

scientific article; zbMATH DE number 6039281

Language	Label	Description	Also known as
default for all languages	No label defined
English	Action time sharing policies for ergodic control of Markov chains	scientific article; zbMATH DE number 6039281

Statements

instance of

0 references

0 references

0 references

0 references

SIAM Journal on Control and Optimization

0 references

publication date

30 May 2012

0 references

zbMATH Keywords

Markov decision process

0 references

adaptive control

0 references

ergodic control

0 references

long-time average cost

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1137/100798557

0 references

title

Action time sharing policies for ergodic control of Markov chains (English)

0 references

review text

Ergodic control for discrete time controlled Markov chains with a locally compact state space and a compact action space is considered under suitable stability, irreducibility, and Feller continuity conditions. A flexible family of controls, called action time sharing (ATS) policies, associated with a given continuous stationary Markov control, is introduced. It is shown that the long-term average cost for such a control policy, for a broad range of one-stage cost functions, is the same as that for the associated stationary Markov policy. In addition, ATS policies are well suited for a range of estimation, information collection, and adaptive control goals. To illustrate the possibilities we present two examples. The first demonstrates a construction of an ATS policy that leads to consistent estimators for unknown model parameters while producing the desired long-term average cost value. The second examples considers a setting where the target stationary Markov control \(q\) is not known but there are sampling schemes available that allow for consistent estimation of \(q\). WE construct an ATS policy which uses dynamic estimators for \(q\) for control decisions and show that the associated cost coincides with that for the unknown Markov control \(q\).

0 references

reviewed by

Marius Iosifescu

0 references