A unified approach to adaptive control of average reward Markov decision processes (Q1095048): Difference between revisions

The paper presents a general optimization method for adaptive average reward Markov decision problems. Optimal decisions are determined by applying after each observation of the state and estimation of the unknown parameter a policy improvement step to an auxiliary value function, converging with increasing time to the true relative value. This method includes the classical procedure of estimation and control [cp. \textit{M. Kurano}, J. Oper. Res. Soc. Japan 15, 67-76 (1972; Zbl 0238.90006), and \textit{P. Mandl}, Adv. Appl. Probab. 6, 40-60 (1974; Zbl 0281.60070)], the nonstationary value iteration [cp. \textit{A. Federgruen} and \textit{P. J. Schweitzer}, J. Optimization Theory Appl. 34, 207-241 (1981; Zbl 0457.90083), \textit{R. S. Acosta-Abreu} and \textit{O. Hernandez- Lerma}, Control Cybern. 14, 313-322 (1985; Zbl 0606.90130), and \textit{M. Kurano}, J. Appl. Probab. 24, 270-276 (1987)], and a lot of new procedures, too.

0 references

Mathematics Subject Classification ID

90C40

0 references

zbMATH DE Number

4027209

0 references

zbMATH Keywords

adaptive control

0 references

adaptive average reward Markov decision

0 references

policy improvement

0 references

nonstationary value iteration

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Q3745652

0 references

Q3313754

0 references

The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms

0 references

Nonstationary Markov decision problems with converging parameters

0 references

Contraction mappings underlying undiscounted Markov decision problems

0 references

Adaptive control of discounted Markov decision chains

0 references

Q5599448

0 references

Bounds and good policies in stationary finite–stage Markovian decision problems

0 references

Q5649557

0 references

Adaptive Policies in Markov Decision Processes with Uncertain Transition Matrices

0 references

Learning algorithms for Markov decision processes

0 references

Estimation and control in Markov chains

0 references

Q3881672

0 references

Q4173220

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1095048

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / cites work @@
+Q3745652
@@ Property / cites work: Q3745652 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3313754
@@ Property / cites work: Q3313754 / rank @@
+Normal rank
@@ Property / cites work @@
+The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms
+Normal rank
@@ Property / cites work @@
+Nonstationary Markov decision problems with converging parameters
+Normal rank
@@ Property / cites work @@
+Contraction mappings underlying undiscounted Markov decision problems
+Normal rank
@@ Property / cites work @@
+Adaptive control of discounted Markov decision chains
+Normal rank
@@ Property / cites work @@
+Q5599448
@@ Property / cites work: Q5599448 / rank @@
+Normal rank
@@ Property / cites work @@
+Bounds and good policies in stationary finite–stage Markovian decision problems
+Normal rank
@@ Property / cites work @@
+Q5649557
@@ Property / cites work: Q5649557 / rank @@
+Normal rank
@@ Property / cites work @@
+Adaptive Policies in Markov Decision Processes with Uncertain Transition Matrices
+Normal rank
@@ Property / cites work @@
+Learning algorithms for Markov decision processes
@@ Property / cites work: Learning algorithms for Markov decision processes / rank @@
+Normal rank
@@ Property / cites work @@
+Estimation and control in Markov chains
@@ Property / cites work: Estimation and control in Markov chains / rank @@
+Normal rank
@@ Property / cites work @@
+Q3881672
@@ Property / cites work: Q3881672 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4173220
@@ Property / cites work: Q4173220 / rank @@
+Normal rank