Learning algorithms for Markov decision processes
From MaRDI portal
Publication:3768706
DOI10.2307/3214080zbMath0631.90085OpenAlexW4232384885MaRDI QIDQ3768706
Publication date: 1987
Published in: Journal of Applied Probability (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.2307/3214080
learning algorithmsadaptive controlrelaxationaverage rewardunknown transition lawfinite Markov decision chainunknown random one stage rewards
Related Items
A unified approach to adaptive control of average reward Markov decision processes ⋮ Adaptive policy-iteration and policy-value-iteration for discounted Markov decision processes ⋮ Central limit theorem for the estimator of the value of an optimal stopping problem ⋮ Adaptive control of Markov chains with local updates ⋮ Statistical inference for a finite optimal stopping problem with unknown transition probabilities ⋮ Computationally efficient algorithms for on-line optimization of Markov decision processes ⋮ Unnamed Item