Infinite Horizon Average Cost Dynamic Programming Subject to Total Variation Distance Ambiguity

DOI10.1137/18M1210514;zbMATH Open1421.93148arXiv1512.06510MaRDI QIDQ5232245FDOQ5232245

Authors: I. Tzortzis, Themistoklis Charalambous, Charalambos D. Charalambous

Publication date: 30 August 2019

Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)

Abstract: We analyze the infinite horizon minimax average cost Markov Control Model (MCM), for a class of controlled process conditional distributions, which belong to a ball, with respect to total variation distance metric, centered at a known nominal controlled conditional distribution with radius

R i n [0, 2]

, in which the minimization is over the control strategies and the maximization is over conditional distributions. Upon performing the maximization, a dynamic programming equation is obtained which includes, in addition to the standard terms, the oscillator semi-norm of the cost-to-go. First, the dynamic programming equation is analyzed for finite state and control spaces. We show that if the nominal controlled process distribution is irreducible, then for every stationary Markov control policy the maximizing conditional distribution of the controlled process is also irreducible for

R i n [0, R_{m a x}]

. Second, the generalized dynamic programming is analyzed for Borel spaces. We derive necessary and sufficient conditions for any control strategy to be optimal. Through our analysis, new dynamic programming equations and new policy iteration algorithms are derived. The main feature of the new policy iteration algorithms (which are applied for finite alphabet spaces) is that the policy evaluation and policy improvement steps are performed by using the maximizing conditional distribution, which is obtained via a water filling solution. Finally, the application of the new dynamic programming equations and the corresponding policy iteration algorithms are shown via illustrative examples.

Full work available at URL: https://arxiv.org/abs/1512.06510

Recommendations

zbMATH Keywords

dynamic programming minimax stochastic control total variation distance policy iteration infinite horizon average cost Markov control models

Mathematics Subject Classification ID

Dynamic programming (90C39) Minimax problems in mathematical programming (90C47) Optimal stochastic control (93E20)

Cites Work

Cited In (3)

This page was built for publication: Infinite Horizon Average Cost Dynamic Programming Subject to Total Variation Distance Ambiguity

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5232245)