Learning control of finite Markov chains with an explicit trade-off between estimation and control

From MaRDI portal

Publication:3828969

Jump to:navigation, search

DOI10.1109/21.21595zbMath0674.65036OpenAlexW2015667537MaRDI QIDQ3828969

Hiroshi Takeda, Mitsuo Sato, Ken-Ichi Abe

Publication date: 1988

Published in: IEEE Transactions on Systems, Man, and Cybernetics (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1109/21.21595

zbMATH Keywords

stochastic control control parameter finite Markov chains control policy performance criterion asymptotic optimization frequency coefficient large size models learning control problem

Mathematics Subject Classification ID

Numerical optimization and variational techniques (65K10) Estimation and detection in stochastic control theory (93E10) Markov chains (discrete-time Markov processes on discrete state spaces) (60J10) Optimal stochastic control (93E20)

Related Items

An incremental off-policy search in a model-free Markov decision process using a single sample path, A job scheduling approach based on a learning automaton for a distributed computing system, \({\mathcal Q}\)-learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:3828969&oldid=17420170"