Vector-valued semi-Markovian decision programming (Q1179954)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Vector-valued semi-Markovian decision programming |
scientific article |
Statements
Vector-valued semi-Markovian decision programming (English)
0 references
27 June 1992
0 references
This paper deals with vector-valued semi-Markov decision processes in which: \(S\) is a non-empty countable state set; for each \(i\in S\), \(A(i)\) is a finite action set; \(\alpha\) is a discount parameter; \(\beta_ \alpha(i,a)\) is the expected discounted time between state transitions, given \(\alpha\), \(i\in S\), \(a\in A(i)\); \(r(i,a)\) is the vector-valued reward between successive transitions, given \(i\in S\), \(a\in A(i)\); \(q(j\mid i,a)\) is the transition probability for \(j\in S\), given \(i\in S\), \(a\in A(i)\). Three assumptions are made in order to ensure boundedness in the subsequent analysis. For any policy \(\pi\) in the set of all policies \(\Pi\), \(V_ \alpha(\pi,i)\) is the infinite horizon expected discounted vector-valued reward if the initial state is \(i\in S\). A policy \(\pi^*\in\Pi\) is called efficient in \(\Pi\) with respect to \(i\) if, whenever \(\pi\in\Pi\) and \(V_ \alpha(\pi,i)\geq V_ \alpha(\pi^*,i)\), then \(V_ \alpha(\pi,i)=V_ \alpha(\pi^*,i)\). \(\pi^*\) is called optimal if it is efficient for all \(i\in S\). If \(V_ \alpha(\pi^*,i)\geq V_ \alpha(\pi,i)\) for all \(i\in S\), \(\pi\in\Pi\), then \(\pi^*\) is called strongly optimal. Six theorems and two algorithms are given, where Theorems 1-3 relate to the scalar case, \(r(i,a)\in R\), \(i\in S\), \(a\in A(i)\), and Theorems 4-6 relate to the vector case. Theorem 1 gives the existence of a stationary optimal policy \(\pi^*\), satisfying, for all \(i\in S\): \[ V_ \alpha(\pi^*,i)=\max_{a\in A(i)}\left[r(i,a)+\beta_ \alpha(i,a)\sum_ jq(j\mid i,a)V_ \alpha(\pi^*,j)\right]. \] Theorems 2, 3 give some optimality results in terms of the histories of the process. Theorem 4 demonstrates that a specific algorithm 1 produces an optimal policy (the algorithm works, in effect, by finding the intersection of the efficient policy sets for each \(i\in S)\). Theorem 5 leads to the existence of a stationary optimal policy. Theorem 6 gives conditions for the existence of a strongly optimal policy, and algorithm 2 allows one to detect one if it exists.
0 references
vector-valued semi-Markov decision processes
0 references
countable state set
0 references
finite action set
0 references
existence of a stationary optimal policy
0 references