Learning algorithms for Markov decision processes
From MaRDI portal
Publication:3768706
DOI10.2307/3214080zbMath0631.90085MaRDI QIDQ3768706
Publication date: 1987
Published in: Journal of Applied Probability (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.2307/3214080
learning algorithms; adaptive control; relaxation; average reward; unknown transition law; finite Markov decision chain; unknown random one stage rewards
90C40: Markov and semi-Markov decision processes
Related Items
Computationally efficient algorithms for on-line optimization of Markov decision processes, Statistical inference for a finite optimal stopping problem with unknown transition probabilities, Central limit theorem for the estimator of the value of an optimal stopping problem, Unnamed Item, Adaptive policy-iteration and policy-value-iteration for discounted Markov decision processes