Vector-valued semi-Markovian decision programming (Q1179954): Difference between revisions

This paper deals with vector-valued semi-Markov decision processes in which: \(S\) is a non-empty countable state set; for each \(i\in S\), \(A(i)\) is a finite action set; \(\alpha\) is a discount parameter; \(\beta_ \alpha(i,a)\) is the expected discounted time between state transitions, given \(\alpha\), \(i\in S\), \(a\in A(i)\); \(r(i,a)\) is the vector-valued reward between successive transitions, given \(i\in S\), \(a\in A(i)\); \(q(j\mid i,a)\) is the transition probability for \(j\in S\), given \(i\in S\), \(a\in A(i)\). Three assumptions are made in order to ensure boundedness in the subsequent analysis. For any policy \(\pi\) in the set of all policies \(\Pi\), \(V_ \alpha(\pi,i)\) is the infinite horizon expected discounted vector-valued reward if the initial state is \(i\in S\). A policy \(\pi^*\in\Pi\) is called efficient in \(\Pi\) with respect to \(i\) if, whenever \(\pi\in\Pi\) and \(V_ \alpha(\pi,i)\geq V_ \alpha(\pi^*,i)\), then \(V_ \alpha(\pi,i)=V_ \alpha(\pi^*,i)\). \(\pi^*\) is called optimal if it is efficient for all \(i\in S\). If \(V_ \alpha(\pi^*,i)\geq V_ \alpha(\pi,i)\) for all \(i\in S\), \(\pi\in\Pi\), then \(\pi^*\) is called strongly optimal. Six theorems and two algorithms are given, where Theorems 1-3 relate to the scalar case, \(r(i,a)\in R\), \(i\in S\), \(a\in A(i)\), and Theorems 4-6 relate to the vector case. Theorem 1 gives the existence of a stationary optimal policy \(\pi^*\), satisfying, for all \(i\in S\): \[ V_ \alpha(\pi^*,i)=\max_{a\in A(i)}\left[r(i,a)+\beta_ \alpha(i,a)\sum_ jq(j\mid i,a)V_ \alpha(\pi^*,j)\right]. \] Theorems 2, 3 give some optimality results in terms of the histories of the process. Theorem 4 demonstrates that a specific algorithm 1 produces an optimal policy (the algorithm works, in effect, by finding the intersection of the efficient policy sets for each \(i\in S)\). Theorem 5 leads to the existence of a stationary optimal policy. Theorem 6 gives conditions for the existence of a strongly optimal policy, and algorithm 2 allows one to detect one if it exists.

0 references

zbMATH Keywords

vector-valued semi-Markov decision processes

0 references

countable state set

0 references

finite action set

0 references

existence of a stationary optimal policy

0 references

reviewed by

Douglas J. White

0 references

Identifiers

zbMATH Open document ID

0744.90100

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1179954

Revision as of 08:16, 10 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Removed claim: reviewed by (P1447): Item:Q181183 ← Older edit	Revision as of 08:16, 10 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Changed an Item Newer edit →
	Property / reviewed by
		Douglas J. White
	Property / reviewed by: Douglas J. White / rank
		Normal rank