A time aggregation approach to Markov decision processes (Q1614322): Difference between revisions

An infinite horizon average cost control problem for discrete time ergodic Markov chains is considered. A time aggregation approach is proposed, by which policy iteration of the original problem is replaced by a series of policy iterations on nonintersecting subsets of the state space, using the associated embedded Markov chains and equivalent performance functions. Single sample path-based estimation algorithms are presented. The results are illustrated by numerical and simulation examples.

0 references

Mathematics Subject Classification ID

93E20

0 references

0 references

0 references

0 references

0 references

sample path estimation algorithms

0 references

discrete time ergodic Markov chains

0 references

reviewed by

H. Pragarauskas

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Aggregation of the policy iteration method for nearly completely decomposable Markov chains

0 references

Q4257216

0 references

The relations among potentials, perturbation analysis, and Markov decision processes

0 references

Single sample path-based optimization of Markov chains

0 references

A unified approach to Markov decision problems and performance sensitivity analysis

0 references

Dependability for systems with a partitioned state space: Markov and semi-Markov theory and computational implementation

0 references

Multilayer control of large Markov chains

0 references

Simulation-based optimization of Markov reward processes

0 references

Q3326564

0 references

Using Randomization to Break the Curse of Dimensionality

0 references

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

0 references

Average cost temporal-difference learning

0 references

\({\mathcal Q}\)-learning

0 references

Performance gradient estimation for the very large finite Markov chains

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1614322

@@ Property / author @@
-Cao, Xiren
@@ Property / author: Cao, Xiren / rank @@
-Normal rank
@@ Property / author @@
-Michael C. Fu
@@ Property / author: Michael C. Fu / rank @@
-Normal rank
@@ Property / author @@
-Steven I. Marcus
@@ Property / author: Steven I. Marcus / rank @@
-Normal rank
@@ Property / reviewed by @@
-H. Pragarauskas
@@ Property / reviewed by: H. Pragarauskas / rank @@
-Normal rank
@@ Property / author @@
+Cao, Xiren
@@ Property / author: Cao, Xiren / rank @@
+Normal rank
@@ Property / author @@
+Michael C. Fu
@@ Property / author: Michael C. Fu / rank @@
+Normal rank
@@ Property / author @@
+Steven I. Marcus
@@ Property / author: Steven I. Marcus / rank @@
+Normal rank
@@ Property / reviewed by @@
+H. Pragarauskas
@@ Property / reviewed by: H. Pragarauskas / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / cites work @@
+Aggregation of the policy iteration method for nearly completely decomposable Markov chains
+Normal rank
@@ Property / cites work @@
+Q4257216
@@ Property / cites work: Q4257216 / rank @@
+Normal rank
@@ Property / cites work @@
+The relations among potentials, perturbation analysis, and Markov decision processes
+Normal rank
@@ Property / cites work @@
+Single sample path-based optimization of Markov chains
+Normal rank
@@ Property / cites work @@
+A unified approach to Markov decision problems and performance sensitivity analysis
+Normal rank
@@ Property / cites work @@
+Dependability for systems with a partitioned state space: Markov and semi-Markov theory and computational implementation
+Normal rank
@@ Property / cites work @@
+Multilayer control of large Markov chains
@@ Property / cites work: Multilayer control of large Markov chains / rank @@
+Normal rank
@@ Property / cites work @@
+Simulation-based optimization of Markov reward processes
+Normal rank
@@ Property / cites work @@
+Q3326564
@@ Property / cites work: Q3326564 / rank @@
+Normal rank
@@ Property / cites work @@
+Using Randomization to Break the Curse of Dimensionality
+Normal rank
@@ Property / cites work @@
+Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
+Normal rank
@@ Property / cites work @@
+Average cost temporal-difference learning
@@ Property / cites work: Average cost temporal-difference learning / rank @@
+Normal rank
@@ Property / cites work @@
+\({\mathcal Q}\)-learning
@@ Property / cites work: \({\mathcal Q}\)-learning / rank @@
+Normal rank
@@ Property / cites work @@
+Performance gradient estimation for the very large finite Markov chains
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:1614322