Denumerable state nonhomogeneous Markov decision processes (Q805501): Difference between revisions

The framework for this paper is: a denumerable number of states, \(\{\) \(i\}\) ; an infinite number of decision stages, \(\{\) \(k\}\) ; a finite set of actions, \(\{x^ i_ k\}\) for each i and k, and a set of infinite horizon strategies \(\{\) \(x\}\), whose k-th member is a policy k; a set of transition probability matrices \(\{P_ k(x_ k)\}\); a set of immediate reward vectors \(\{R_ k(x_ k)\}\); a discount factor \(0\leq \alpha \leq 1\); a set of expected discounted reward vectors, \(\{V_ 0(x,N)\}\) covering the first N decision stages if strategy x (a finite strategy, x(N), truncated to the first N decision stages) is used. Discount \((\alpha <1)\) and average \((\alpha =1\), lim.inf) optimality are defined for \(N\to \infty\). A strategy, \(x^*\), is algorithmically optimal, if there is a sequence \(\{N_ m\}\) such that \(\{x^*(N_ m)\}\) (i.e. optimal \(N_ m\)-stage strategies) converge to \(x^*\) in the topology induced by a specified metric p(.,.). The paper deals with algorithmic optimality and its relationship with other constructs. Various assumptions are made and each result is governed by some of these assumptions. Theorem 1 demonstrates the existence of algorithmically optimal strategies. Theorem 2 demonstrates that algorithmic optimality implies discount optimality. Theorem 5 demonstrates that algorithmic optimality implies average optimality. Theorem 6 demonstrates that an algorithmically optimal strategy is unique if and only if \(\{x^*(N)\}\) converges to \(x^*\), in the topology induced by the p(.,.) metric, for all choices of \(\{x^*(N)\}.\) Finally, using relative values, solution horizon extensions of some earlier work are given, in terms of algorithmic optimality, and an algorithm is given for finding, in a finite number of iterations, a first policy, \(x^*_ 0\), of an algorithmically optimal strategy \(x^*\).

0 references

Mathematics Subject Classification ID

90C40

0 references

zbMATH DE Number

4204141

0 references

zbMATH Keywords

denumerable number of states

0 references

infinite number of decision stages

0 references

finite set of actions

0 references

infinite horizon strategies

0 references

algorithmic optimality

0 references

algorithmically optimal strategies

0 references

solution horizon extensions

0 references

reviewed by

Douglas J. White

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Conditions for the Existence of Planning Horizons

0 references

An on-line procedure in discounted infinite-horizon stochastic optimal control

0 references

Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state

0 references

Contraction mappings underlying undiscounted Markov decision problems

0 references

A forecast horizon and a stopping rule for general Markov decision processes

0 references

Q3206683

0 references

Technical Note—Identifying Forecast Horizons in Nonhomogeneous Markov Decision Processes

0 references

A New Optimality Criterion for Nonhomogeneous Markov Decision Processes

0 references

On Two Recent Papers on Ergodicity in Nonhomogeneous Markov Chains

0 references

Non-Discounted Denumerable Markovian Decision Models

0 references

Q3990571

0 references

Turnpike Planning Horizons for a Markovian Decision Model

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:805501

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / cites work @@
+Conditions for the Existence of Planning Horizons
@@ Property / cites work: Conditions for the Existence of Planning Horizons / rank @@
+Normal rank
@@ Property / cites work @@
+An on-line procedure in discounted infinite-horizon stochastic optimal control
+Normal rank
@@ Property / cites work @@
+Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state
+Normal rank
@@ Property / cites work @@
+Contraction mappings underlying undiscounted Markov decision problems
+Normal rank
@@ Property / cites work @@
+A forecast horizon and a stopping rule for general Markov decision processes
+Normal rank
@@ Property / cites work @@
+Q3206683
@@ Property / cites work: Q3206683 / rank @@
+Normal rank
@@ Property / cites work @@
+Technical Note—Identifying Forecast Horizons in Nonhomogeneous Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+A New Optimality Criterion for Nonhomogeneous Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+On Two Recent Papers on Ergodicity in Nonhomogeneous Markov Chains
+Normal rank
@@ Property / cites work @@
+Non-Discounted Denumerable Markovian Decision Models
+Normal rank
@@ Property / cites work @@
+Q3990571
@@ Property / cites work: Q3990571 / rank @@
+Normal rank
@@ Property / cites work @@
+Turnpike Planning Horizons for a Markovian Decision Model
+Normal rank