A unified approach to adaptive control of average reward Markov decision processes (Q1095048)

From MaRDI portal
scientific article
Language Label Description Also known as
English
A unified approach to adaptive control of average reward Markov decision processes
scientific article

    Statements

    A unified approach to adaptive control of average reward Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    1988
    0 references
    The paper presents a general optimization method for adaptive average reward Markov decision problems. Optimal decisions are determined by applying after each observation of the state and estimation of the unknown parameter a policy improvement step to an auxiliary value function, converging with increasing time to the true relative value. This method includes the classical procedure of estimation and control [cp. \textit{M. Kurano}, J. Oper. Res. Soc. Japan 15, 67-76 (1972; Zbl 0238.90006), and \textit{P. Mandl}, Adv. Appl. Probab. 6, 40-60 (1974; Zbl 0281.60070)], the nonstationary value iteration [cp. \textit{A. Federgruen} and \textit{P. J. Schweitzer}, J. Optimization Theory Appl. 34, 207-241 (1981; Zbl 0457.90083), \textit{R. S. Acosta-Abreu} and \textit{O. Hernandez- Lerma}, Control Cybern. 14, 313-322 (1985; Zbl 0606.90130), and \textit{M. Kurano}, J. Appl. Probab. 24, 270-276 (1987)], and a lot of new procedures, too.
    0 references
    0 references
    adaptive control
    0 references
    adaptive average reward Markov decision
    0 references
    policy improvement
    0 references
    nonstationary value iteration
    0 references