Adaptive control of discounted Markov decision chains (Q796461): Difference between revisions

We consider discounted-reward finite-state Markov decision processes which depend on unknown parameters. An adaptive policy inspired by the nonstationary value iteration scheme of \textit{A. Federgruen} and \textit{P. J. Schweitzer} [ibid. 34, 207-241 (1981; Zbl 0426.90091)] is proposed. This policy is briefly compared with the principle of estimation and control recently obtained by \textit{M. Schäl} [Lect. Notes Pure Appl. Math. 86, 239-253 (1983; Zbl 0525.93071)].

0 references

zbMATH Keywords

discounted-reward finite-state Markov decision processes

0 references

adaptive policy

0 references

nonstationary value iteration

0 references

Identifiers

zbMATH Open document ID

0543.90093

0 references

DOI

10.1007/BF00938426

0 references

Mathematics Subject Classification ID

90C40

0 references

zbMATH DE Number

3865009

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:796461

Revision as of 04:32, 11 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Removed claim: author (P16): Item:Q217261 ← Older edit	Revision as of 04:32, 11 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Changed an Item Newer edit →
	Property / author
		Onésimo Hernández-Lerma
	Property / author: Onésimo Hernández-Lerma / rank
		Normal rank