Policy iteration for continuous-time average reward Markov decision processes in Polish spaces (Q963139): Difference between revisions
From MaRDI portal
Latest revision as of 15:45, 2 July 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Policy iteration for continuous-time average reward Markov decision processes in Polish spaces |
scientific article |
Statements
Policy iteration for continuous-time average reward Markov decision processes in Polish spaces (English)
0 references
8 April 2010
0 references
Summary: We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then, under two slightly different sets of conditions, we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.
0 references
transition rates
0 references
reward rates
0 references
average reward optimality equation
0 references
0 references
0 references
0 references
0 references
0 references
0 references
0 references
0 references
0 references
0 references