Denumerable state nonhomogeneous Markov decision processes (Q805501): Difference between revisions

From MaRDI portal
RedirectionBot (talk | contribs)
Changed an Item
ReferenceBot (talk | contribs)
Changed an Item
 
(One intermediate revision by one other user not shown)
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / cites work
 
Property / cites work: Conditions for the Existence of Planning Horizons / rank
 
Normal rank
Property / cites work
 
Property / cites work: An on-line procedure in discounted infinite-horizon stochastic optimal control / rank
 
Normal rank
Property / cites work
 
Property / cites work: Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state / rank
 
Normal rank
Property / cites work
 
Property / cites work: Contraction mappings underlying undiscounted Markov decision problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: A forecast horizon and a stopping rule for general Markov decision processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3206683 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Technical Note—Identifying Forecast Horizons in Nonhomogeneous Markov Decision Processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: A New Optimality Criterion for Nonhomogeneous Markov Decision Processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: On Two Recent Papers on Ergodicity in Nonhomogeneous Markov Chains / rank
 
Normal rank
Property / cites work
 
Property / cites work: Non-Discounted Denumerable Markovian Decision Models / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3990571 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Turnpike Planning Horizons for a Markovian Decision Model / rank
 
Normal rank

Latest revision as of 17:32, 21 June 2024

scientific article
Language Label Description Also known as
English
Denumerable state nonhomogeneous Markov decision processes
scientific article

    Statements

    Denumerable state nonhomogeneous Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    1990
    0 references
    The framework for this paper is: a denumerable number of states, \(\{\) \(i\}\) ; an infinite number of decision stages, \(\{\) \(k\}\) ; a finite set of actions, \(\{x^ i_ k\}\) for each i and k, and a set of infinite horizon strategies \(\{\) \(x\}\), whose k-th member is a policy k; a set of transition probability matrices \(\{P_ k(x_ k)\}\); a set of immediate reward vectors \(\{R_ k(x_ k)\}\); a discount factor \(0\leq \alpha \leq 1\); a set of expected discounted reward vectors, \(\{V_ 0(x,N)\}\) covering the first N decision stages if strategy x (a finite strategy, x(N), truncated to the first N decision stages) is used. Discount \((\alpha <1)\) and average \((\alpha =1\), lim.inf) optimality are defined for \(N\to \infty\). A strategy, \(x^*\), is algorithmically optimal, if there is a sequence \(\{N_ m\}\) such that \(\{x^*(N_ m)\}\) (i.e. optimal \(N_ m\)-stage strategies) converge to \(x^*\) in the topology induced by a specified metric p(.,.). The paper deals with algorithmic optimality and its relationship with other constructs. Various assumptions are made and each result is governed by some of these assumptions. Theorem 1 demonstrates the existence of algorithmically optimal strategies. Theorem 2 demonstrates that algorithmic optimality implies discount optimality. Theorem 5 demonstrates that algorithmic optimality implies average optimality. Theorem 6 demonstrates that an algorithmically optimal strategy is unique if and only if \(\{x^*(N)\}\) converges to \(x^*\), in the topology induced by the p(.,.) metric, for all choices of \(\{x^*(N)\}.\) Finally, using relative values, solution horizon extensions of some earlier work are given, in terms of algorithmic optimality, and an algorithm is given for finding, in a finite number of iterations, a first policy, \(x^*_ 0\), of an algorithmically optimal strategy \(x^*\).
    0 references
    0 references
    denumerable number of states
    0 references
    infinite number of decision stages
    0 references
    finite set of actions
    0 references
    infinite horizon strategies
    0 references
    algorithmic optimality
    0 references
    algorithmically optimal strategies
    0 references
    solution horizon extensions
    0 references