Sleeping experts and bandits approach to constrained Markov decision processes
From MaRDI portal
Publication:901196
DOI10.1016/j.automatica.2015.10.015zbMath1329.93154arXiv1412.4898OpenAlexW2132036095MaRDI QIDQ901196
Publication date: 23 December 2015
Published in: Automatica (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1412.4898
Learning and adaptive systems in artificial intelligence (68T05) Optimal stochastic control (93E20) Markov and semi-Markov decision processes (90C40)
Cites Work
- An exact iterative search algorithm for constrained Markov decision processes
- Simulation-based algorithms for Markov decision processes.
- Sample average approximation of expected value constrained stochastic programs
- Regret bounds for sleeping experts and bandits
- Non-randomized policies for constrained Markov decision processes
- Constrained Discounted Markov Decision Processes and Hamiltonian Cycles
- The Sample Average Approximation Method for Stochastic Discrete Optimization
- Approximate Dynamic Programming
- ${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control
- Stochastic approximation algorithms for constrained optimization via simulation
- Simulation-Based Discrete Optimization of Stochastic Discrete Event Systems Subject to Non Closed-Form Constraints
- Stochastically Constrained Ranking and Selection via SCORE
- Probability Inequalities for Sums of Bounded Random Variables