Vector-valued semi-Markovian decision programming (Q1179954)

This paper deals with vector-valued semi-Markov decision processes in which: \(S\) is a non-empty countable state set; for each \(i\in S\), \(A(i)\) is a finite action set; \(\alpha\) is a discount parameter; \(\beta_ \alpha(i,a)\) is the expected discounted time between state transitions, given \(\alpha\), \(i\in S\), \(a\in A(i)\); \(r(i,a)\) is the vector-valued reward between successive transitions, given \(i\in S\), \(a\in A(i)\); \(q(j\mid i,a)\) is the transition probability for \(j\in S\), given \(i\in S\), \(a\in A(i)\). Three assumptions are made in order to ensure boundedness in the subsequent analysis. For any policy \(\pi\) in the set of all policies \(\Pi\), \(V_ \alpha(\pi,i)\) is the infinite horizon expected discounted vector-valued reward if the initial state is \(i\in S\). A policy \(\pi^*\in\Pi\) is called efficient in \(\Pi\) with respect to \(i\) if, whenever \(\pi\in\Pi\) and \(V_ \alpha(\pi,i)\geq V_ \alpha(\pi^*,i)\), then \(V_ \alpha(\pi,i)=V_ \alpha(\pi^*,i)\). \(\pi^*\) is called optimal if it is efficient for all \(i\in S\). If \(V_ \alpha(\pi^*,i)\geq V_ \alpha(\pi,i)\) for all \(i\in S\), \(\pi\in\Pi\), then \(\pi^*\) is called strongly optimal. Six theorems and two algorithms are given, where Theorems 1-3 relate to the scalar case, \(r(i,a)\in R\), \(i\in S\), \(a\in A(i)\), and Theorems 4-6 relate to the vector case. Theorem 1 gives the existence of a stationary optimal policy \(\pi^*\), satisfying, for all \(i\in S\): \[ V_ \alpha(\pi^*,i)=\max_{a\in A(i)}\left[r(i,a)+\beta_ \alpha(i,a)\sum_ jq(j\mid i,a)V_ \alpha(\pi^*,j)\right]. \] Theorems 2, 3 give some optimality results in terms of the histories of the process. Theorem 4 demonstrates that a specific algorithm 1 produces an optimal policy (the algorithm works, in effect, by finding the intersection of the efficient policy sets for each \(i\in S)\). Theorem 5 leads to the existence of a stationary optimal policy. Theorem 6 gives conditions for the existence of a strongly optimal policy, and algorithm 2 allows one to detect one if it exists.

0 references

zbMATH Keywords

vector-valued semi-Markov decision processes

0 references

countable state set

0 references

finite action set

0 references

existence of a stationary optimal policy

0 references

reviewed by

Douglas J. White