A time aggregation approach to Markov decision processes (Q1614322): Difference between revisions

From MaRDI portal
Importer (talk | contribs)
Created a new Item
 
ReferenceBot (talk | contribs)
Changed an Item
 
(4 intermediate revisions by 3 users not shown)
Property / author
 
Property / author: Cao, Xiren / rank
Normal rank
 
Property / author
 
Property / author: Michael C. Fu / rank
Normal rank
 
Property / author
 
Property / author: Steven I. Marcus / rank
Normal rank
 
Property / reviewed by
 
Property / reviewed by: H. Pragarauskas / rank
Normal rank
 
Property / author
 
Property / author: Cao, Xiren / rank
 
Normal rank
Property / author
 
Property / author: Michael C. Fu / rank
 
Normal rank
Property / author
 
Property / author: Steven I. Marcus / rank
 
Normal rank
Property / reviewed by
 
Property / reviewed by: H. Pragarauskas / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / cites work
 
Property / cites work: Aggregation of the policy iteration method for nearly completely decomposable Markov chains / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4257216 / rank
 
Normal rank
Property / cites work
 
Property / cites work: The relations among potentials, perturbation analysis, and Markov decision processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Single sample path-based optimization of Markov chains / rank
 
Normal rank
Property / cites work
 
Property / cites work: A unified approach to Markov decision problems and performance sensitivity analysis / rank
 
Normal rank
Property / cites work
 
Property / cites work: Dependability for systems with a partitioned state space: Markov and semi-Markov theory and computational implementation / rank
 
Normal rank
Property / cites work
 
Property / cites work: Multilayer control of large Markov chains / rank
 
Normal rank
Property / cites work
 
Property / cites work: Simulation-based optimization of Markov reward processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3326564 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Using Randomization to Break the Curse of Dimensionality / rank
 
Normal rank
Property / cites work
 
Property / cites work: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Average cost temporal-difference learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: \({\mathcal Q}\)-learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Performance gradient estimation for the very large finite Markov chains / rank
 
Normal rank
links / mardi / namelinks / mardi / name
 

Latest revision as of 16:43, 4 June 2024

scientific article
Language Label Description Also known as
English
A time aggregation approach to Markov decision processes
scientific article

    Statements

    A time aggregation approach to Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    5 September 2002
    0 references
    An infinite horizon average cost control problem for discrete time ergodic Markov chains is considered. A time aggregation approach is proposed, by which policy iteration of the original problem is replaced by a series of policy iterations on nonintersecting subsets of the state space, using the associated embedded Markov chains and equivalent performance functions. Single sample path-based estimation algorithms are presented. The results are illustrated by numerical and simulation examples.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    sample path estimation algorithms
    0 references
    discrete time ergodic Markov chains
    0 references