Action time sharing policies for ergodic control of Markov chains (Q2884591)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Action time sharing policies for ergodic control of Markov chains |
scientific article; zbMATH DE number 6039281
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | Action time sharing policies for ergodic control of Markov chains |
scientific article; zbMATH DE number 6039281 |
Statements
30 May 2012
0 references
Markov decision process
0 references
adaptive control
0 references
ergodic control
0 references
long-time average cost
0 references
Action time sharing policies for ergodic control of Markov chains (English)
0 references
Ergodic control for discrete time controlled Markov chains with a locally compact state space and a compact action space is considered under suitable stability, irreducibility, and Feller continuity conditions. A flexible family of controls, called action time sharing (ATS) policies, associated with a given continuous stationary Markov control, is introduced. It is shown that the long-term average cost for such a control policy, for a broad range of one-stage cost functions, is the same as that for the associated stationary Markov policy. In addition, ATS policies are well suited for a range of estimation, information collection, and adaptive control goals. To illustrate the possibilities we present two examples. The first demonstrates a construction of an ATS policy that leads to consistent estimators for unknown model parameters while producing the desired long-term average cost value. The second examples considers a setting where the target stationary Markov control \(q\) is not known but there are sampling schemes available that allow for consistent estimation of \(q\). WE construct an ATS policy which uses dynamic estimators for \(q\) for control decisions and show that the associated cost coincides with that for the unknown Markov control \(q\).
0 references
0.8371891379356384
0 references
0.7965841889381409
0 references
0.7917491793632507
0 references
0.7668877243995667
0 references
0.7645235061645508
0 references