Learning Algorithms for Markov Decision Processes with Average Cost
From MaRDI portal
Publication:2753225
DOI10.1137/S0363012999361974zbMath1001.93091OpenAlexW2154204727MaRDI QIDQ2753225
Dimitri P. Bertsekas, Vivek S. Borkar, Jinane Abounadi
Publication date: 29 October 2001
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1137/s0363012999361974
Dynamic programming in optimal control and differential games (49L20) Optimal stochastic control (93E20) Stochastic approximation (62L20) Stochastic stability in control theory (93E15) Markov and semi-Markov decision processes (90C40)
Related Items (26)
Multiscale Q-learning with linear function approximation ⋮ A sojourn-based approach to semi-Markov reinforcement learning ⋮ Deep reinforcement learning for wireless sensor scheduling in cyber-physical systems ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ A framework for transforming specifications in reinforcement learning ⋮ Optimal sensor scheduling for remote state estimation with limited bandwidth: a deep reinforcement learning approach ⋮ Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds ⋮ Approachability in Stackelberg stochastic games with vector costs ⋮ Analyzing anonymity attacks through noisy channels ⋮ Solutions of the average cost optimality equation for Markov decision processes with weakly continuous kernel: the fixed-point approach revisited ⋮ Q-learning for Markov decision processes with a satisfiability criterion ⋮ Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques ⋮ Learning dynamic prices in electronic retail markets with customer segmentation ⋮ Optimal Distributed Uplink Channel Allocation: A Constrained MDP Formulation ⋮ Opportunistic Transmission over Randomly Varying Channels ⋮ Empirical Dynamic Programming ⋮ Fitted Q-iteration by functional networks for control problems ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs ⋮ Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration ⋮ Natural actor-critic algorithms ⋮ Dynamic pricing models for electronic business ⋮ Fundamental design principles for reinforcement learning algorithms ⋮ Empirical Q-Value Iteration ⋮ Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities ⋮ Batch policy learning in average reward Markov decision processes ⋮ Whittle index based Q-learning for restless bandits with average reward
This page was built for publication: Learning Algorithms for Markov Decision Processes with Average Cost