Nonparametric estimation and adaptive control in a class of finite Markov decision chains (Q1174701)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Nonparametric estimation and adaptive control in a class of finite Markov decision chains |
scientific article |
Statements
Nonparametric estimation and adaptive control in a class of finite Markov decision chains (English)
0 references
25 June 1992
0 references
A finite state and action space discounted Markov decision problem is considered where the transition law is completely unknown. This transition law is sequentially estimated while controlling the system. It is assumed that the state space is irreducible under any stationary policy. The usual control policies --- derived from the principle of estimation and control (optimality for present estimation) and from nonstationary value iteration (one-step improvement using present estimation) --- are modified by chosing any other action with a small (decreasing) probability. These modified policies are shown to lead to strongly consistent estimators and to be asymptotically discount optimal. The problem is similar to that of \textit{M. Kurano} [J. Appl. Probab. 24, 270-276 (1987; Zbl 0631.90085)], where the average case is treated and the reward is random, too.
0 references
unknown transition law
0 references
frequency estimators
0 references
finite state and action space
0 references
discounted Markov decision problem
0 references
nonstationary value iteration
0 references
strongly consistent estimators
0 references
0 references
0 references
0 references