Another set of conditions for Markov decision processes with average sample-path costs (Q2506454)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Another set of conditions for Markov decision processes with average sample-path costs
scientific article

    Statements

    Another set of conditions for Markov decision processes with average sample-path costs (English)
    0 references
    0 references
    0 references
    28 September 2006
    0 references
    This paper studies discrete-time Markov decision processes with Borel state and action spaces and with average costs per unit time. The main focus is the sufficient conditions for the existence of sample-path optimal and \(\varepsilon\)-optimal stationary policies. The cost functions are not assumed to be bounded. The authors formulate three assumptions: A, B, and C. Assumption A is the Lyapunov-type inequality condition for some weight function \(w\). Assumption B requests the compactness of action sets, the lower semicontinuity of one-step costs in the control parameter, the setwise continuity of transition probabilities in the control parameter, and the continuity of the expected value of the function \(w\) at the next step in the control parameter. Assumption C requires the existence of discounted relative value functions that are bounded from above and below for all discount factors by two functions bounded in the weighted supremum norm with respect to the weight function \(w\). The main results of the paper are formulated in Theorems 4.1 and 4.2. Theorem 4.1 states that, under assumptions A, B, and C, the pair of optimality inequalities hold and the stationary policy that minimizes one of these inequalities is average sample-path optimal. This theorem also provides sufficient conditions for a stationary policy to be average sample-path \(\varepsilon\)-optimal. Theorem 4.2 states that, under stronger conditions, a stationary policy is average sample-path optimal if and only if it is average expected cost optimal.
    0 references
    discrete-time Markov decision process
    0 references
    average sample-path cost
    0 references
    optimality inequality
    0 references
    optimal stationary policy
    0 references

    Identifiers