Nonparametric estimation and adaptive control in a class of finite Markov decision chains (Q1174701)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	Nonparametric estimation and adaptive control in a class of finite Markov decision chains	scientific article

Statements

scholarly article

0 references

Nonparametric estimation and adaptive control in a class of finite Markov decision chains (English)

0 references

Rolando Cavazos-Cadena

0 references

Annals of Operations Research

0 references

publication date

25 June 1992

0 references

A finite state and action space discounted Markov decision problem is considered where the transition law is completely unknown. This transition law is sequentially estimated while controlling the system. It is assumed that the state space is irreducible under any stationary policy. The usual control policies --- derived from the principle of estimation and control (optimality for present estimation) and from nonstationary value iteration (one-step improvement using present estimation) --- are modified by chosing any other action with a small (decreasing) probability. These modified policies are shown to lead to strongly consistent estimators and to be asymptotically discount optimal. The problem is similar to that of \textit{M. Kurano} [J. Appl. Probab. 24, 270-276 (1987; Zbl 0631.90085)], where the average case is treated and the reward is random, too.

0 references

zbMATH Keywords

unknown transition law

0 references

frequency estimators

0 references

finite state and action space

0 references

discounted Markov decision problem

0 references

nonstationary value iteration

0 references

strongly consistent estimators

0 references

MaRDI profile type

MaRDI publication profile

0 references

0 references

On Minimum Cost Per Unit Time Control of Markov Chains

0 references

0 references

0 references

Nonstationary Markov decision problems with converging parameters

0 references

Adaptive Strategies for Certain Classes of Controlled Markov Processes

0 references

Adaptive control of discounted Markov decision chains

0 references

Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution

0 references

Density estimation and adaptive control of Markov processes: Average and discounted criteria

0 references

0 references

The average-optimal adaptive control of a Markov renewal model in presence of an unknown parameter

0 references

0 references

0 references

Estimation and control in Markov chains

0 references

0 references

Estimation and control in discounted stochastic dynamic programming

0 references

full work available at URL

https://doi.org/10.1007/bf02055580

0 references

Identifiers

zbMATH Open document ID

0 references

10.1007/BF02055580

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1174701

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q1174701&oldid=37335177"