A unified approach to adaptive control of average reward Markov decision processes (Q1095048): Difference between revisions

The paper presents a general optimization method for adaptive average reward Markov decision problems. Optimal decisions are determined by applying after each observation of the state and estimation of the unknown parameter a policy improvement step to an auxiliary value function, converging with increasing time to the true relative value. This method includes the classical procedure of estimation and control [cp. \textit{M. Kurano}, J. Oper. Res. Soc. Japan 15, 67-76 (1972; Zbl 0238.90006), and \textit{P. Mandl}, Adv. Appl. Probab. 6, 40-60 (1974; Zbl 0281.60070)], the nonstationary value iteration [cp. \textit{A. Federgruen} and \textit{P. J. Schweitzer}, J. Optimization Theory Appl. 34, 207-241 (1981; Zbl 0457.90083), \textit{R. S. Acosta-Abreu} and \textit{O. Hernandez- Lerma}, Control Cybern. 14, 313-322 (1985; Zbl 0606.90130), and \textit{M. Kurano}, J. Appl. Probab. 24, 270-276 (1987)], and a lot of new procedures, too.

0 references

zbMATH Keywords

adaptive control

0 references

adaptive average reward Markov decision

0 references

policy improvement

0 references

nonstationary value iteration

0 references

Identifiers

zbMATH Open document ID

0631.90084

0 references

DOI

10.1007/BF01740510

0 references

Mathematics Subject Classification ID

90C40

0 references

zbMATH DE Number

4027209

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1095048

Revision as of 19:40, 12 July 2023 Importer (talk \| contribs) Bots 7,080,617 edits ‎Created a new Item	Revision as of 01:28, 31 January 2024 Import240129110113 (talk \| contribs) Bots 7,163,963 edits Added link to MaRDI item. Newer edit →
links / mardi / name	links / mardi / name
		Publication:1095048