Learning control of finite Markov chains with an explicit trade-off between estimation and control
From MaRDI portal
Publication:3828969
DOI10.1109/21.21595zbMath0674.65036OpenAlexW2015667537MaRDI QIDQ3828969
Hiroshi Takeda, Mitsuo Sato, Ken-Ichi Abe
Publication date: 1988
Published in: IEEE Transactions on Systems, Man, and Cybernetics (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1109/21.21595
stochastic controlcontrol parameterfinite Markov chainscontrol policyperformance criterionasymptotic optimizationfrequency coefficientlarge size modelslearning control problem
Numerical optimization and variational techniques (65K10) Estimation and detection in stochastic control theory (93E10) Markov chains (discrete-time Markov processes on discrete state spaces) (60J10) Optimal stochastic control (93E20)
Related Items
An incremental off-policy search in a model-free Markov decision process using a single sample path, A job scheduling approach based on a learning automaton for a distributed computing system, \({\mathcal Q}\)-learning