A unified approach to adaptive control of average reward Markov decision processes (Q1095048): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
ReferenceBot (talk | contribs)
Changed an Item
 
(One intermediate revision by one other user not shown)
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3745652 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3313754 / rank
 
Normal rank
Property / cites work
 
Property / cites work: The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms / rank
 
Normal rank
Property / cites work
 
Property / cites work: Nonstationary Markov decision problems with converging parameters / rank
 
Normal rank
Property / cites work
 
Property / cites work: Contraction mappings underlying undiscounted Markov decision problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: Adaptive control of discounted Markov decision chains / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5599448 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Bounds and good policies in stationary finite–stage Markovian decision problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5649557 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Adaptive Policies in Markov Decision Processes with Uncertain Transition Matrices / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning algorithms for Markov decision processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Estimation and control in Markov chains / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3881672 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4173220 / rank
 
Normal rank

Latest revision as of 13:26, 18 June 2024

scientific article
Language Label Description Also known as
English
A unified approach to adaptive control of average reward Markov decision processes
scientific article

    Statements

    A unified approach to adaptive control of average reward Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    1988
    0 references
    The paper presents a general optimization method for adaptive average reward Markov decision problems. Optimal decisions are determined by applying after each observation of the state and estimation of the unknown parameter a policy improvement step to an auxiliary value function, converging with increasing time to the true relative value. This method includes the classical procedure of estimation and control [cp. \textit{M. Kurano}, J. Oper. Res. Soc. Japan 15, 67-76 (1972; Zbl 0238.90006), and \textit{P. Mandl}, Adv. Appl. Probab. 6, 40-60 (1974; Zbl 0281.60070)], the nonstationary value iteration [cp. \textit{A. Federgruen} and \textit{P. J. Schweitzer}, J. Optimization Theory Appl. 34, 207-241 (1981; Zbl 0457.90083), \textit{R. S. Acosta-Abreu} and \textit{O. Hernandez- Lerma}, Control Cybern. 14, 313-322 (1985; Zbl 0606.90130), and \textit{M. Kurano}, J. Appl. Probab. 24, 270-276 (1987)], and a lot of new procedures, too.
    0 references
    0 references
    adaptive control
    0 references
    adaptive average reward Markov decision
    0 references
    policy improvement
    0 references
    nonstationary value iteration
    0 references