Vector-valued semi-Markovian decision programming (Q1179954)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Vector-valued semi-Markovian decision programming
scientific article

    Statements

    Vector-valued semi-Markovian decision programming (English)
    0 references
    0 references
    0 references
    27 June 1992
    0 references
    This paper deals with vector-valued semi-Markov decision processes in which: \(S\) is a non-empty countable state set; for each \(i\in S\), \(A(i)\) is a finite action set; \(\alpha\) is a discount parameter; \(\beta_ \alpha(i,a)\) is the expected discounted time between state transitions, given \(\alpha\), \(i\in S\), \(a\in A(i)\); \(r(i,a)\) is the vector-valued reward between successive transitions, given \(i\in S\), \(a\in A(i)\); \(q(j\mid i,a)\) is the transition probability for \(j\in S\), given \(i\in S\), \(a\in A(i)\). Three assumptions are made in order to ensure boundedness in the subsequent analysis. For any policy \(\pi\) in the set of all policies \(\Pi\), \(V_ \alpha(\pi,i)\) is the infinite horizon expected discounted vector-valued reward if the initial state is \(i\in S\). A policy \(\pi^*\in\Pi\) is called efficient in \(\Pi\) with respect to \(i\) if, whenever \(\pi\in\Pi\) and \(V_ \alpha(\pi,i)\geq V_ \alpha(\pi^*,i)\), then \(V_ \alpha(\pi,i)=V_ \alpha(\pi^*,i)\). \(\pi^*\) is called optimal if it is efficient for all \(i\in S\). If \(V_ \alpha(\pi^*,i)\geq V_ \alpha(\pi,i)\) for all \(i\in S\), \(\pi\in\Pi\), then \(\pi^*\) is called strongly optimal. Six theorems and two algorithms are given, where Theorems 1-3 relate to the scalar case, \(r(i,a)\in R\), \(i\in S\), \(a\in A(i)\), and Theorems 4-6 relate to the vector case. Theorem 1 gives the existence of a stationary optimal policy \(\pi^*\), satisfying, for all \(i\in S\): \[ V_ \alpha(\pi^*,i)=\max_{a\in A(i)}\left[r(i,a)+\beta_ \alpha(i,a)\sum_ jq(j\mid i,a)V_ \alpha(\pi^*,j)\right]. \] Theorems 2, 3 give some optimality results in terms of the histories of the process. Theorem 4 demonstrates that a specific algorithm 1 produces an optimal policy (the algorithm works, in effect, by finding the intersection of the efficient policy sets for each \(i\in S)\). Theorem 5 leads to the existence of a stationary optimal policy. Theorem 6 gives conditions for the existence of a strongly optimal policy, and algorithm 2 allows one to detect one if it exists.
    0 references
    0 references
    0 references
    0 references
    0 references
    vector-valued semi-Markov decision processes
    0 references
    countable state set
    0 references
    finite action set
    0 references
    existence of a stationary optimal policy
    0 references