Nonparametric estimation and adaptive control in a class of finite Markov decision chains (Q1174701)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Nonparametric estimation and adaptive control in a class of finite Markov decision chains
scientific article

    Statements

    Nonparametric estimation and adaptive control in a class of finite Markov decision chains (English)
    0 references
    25 June 1992
    0 references
    A finite state and action space discounted Markov decision problem is considered where the transition law is completely unknown. This transition law is sequentially estimated while controlling the system. It is assumed that the state space is irreducible under any stationary policy. The usual control policies --- derived from the principle of estimation and control (optimality for present estimation) and from nonstationary value iteration (one-step improvement using present estimation) --- are modified by chosing any other action with a small (decreasing) probability. These modified policies are shown to lead to strongly consistent estimators and to be asymptotically discount optimal. The problem is similar to that of \textit{M. Kurano} [J. Appl. Probab. 24, 270-276 (1987; Zbl 0631.90085)], where the average case is treated and the reward is random, too.
    0 references
    unknown transition law
    0 references
    frequency estimators
    0 references
    finite state and action space
    0 references
    discounted Markov decision problem
    0 references
    nonstationary value iteration
    0 references
    strongly consistent estimators
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references