Stability estimation of some Markov controlled processes (Q2111840)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Stability estimation of some Markov controlled processes
scientific article

    Statements

    Stability estimation of some Markov controlled processes (English)
    0 references
    17 January 2023
    0 references
    The paper analyses a discrete-time Markovian controlled process with expected total discounted reward. The distribution \(G\) of the underlying randomness is unknown, but assumed to be approximated by some known distribution \(\tilde G\) -- obtained, for example, from statistical data. The `real' process \(X\) is driven by \(G\); the `approximating' process \(\tilde X\) is driven by \(\tilde G\). The aim is to develop stability estimates, comparing the `real' and `approximating' processes. A discount factor \(\alpha \in [0,1)\) is used, discounting future rewards, and the reward function \(r\) is assumed to be uniformly bounded, say by \(b < \infty\). Let \(V(\pi)\) and \(\tilde V(\pi)\) be the value function when policy \(\pi\) is followed for the `real' and `approximating' process, respectively: \[ \textstyle V(\pi) := \mathbb E^\pi\bigl( \sum_{t\ge1} \alpha^{t-1} r(X_{t-1}, a_t) \bigr) \quad\text{and}\quad \tilde V(\pi) := \mathbb E^\pi\bigl( \sum_{t\ge1} \alpha^{t-1} r(\tilde X_{t-1}, a_t) \bigr); \] here, \(a_t\) is the \textit{action} at time \(t\), chosen according to the policy and the information revealed prior. It is known that, under certain conditions, there exist \textit{stationary} policies -- that is, ones which choose the next action as a function only of the current state -- \(f_\star\) and \(\tilde{f_\star}\) which optimise the respective value functions: \[ \textstyle V(f_\star) = \sup_\pi V(\pi) \quad\text{and}\quad \tilde V(\tilde{f_\star}) = \sup_{\tilde \pi} \tilde V(\tilde \pi), \] where the suprema are over \emph{all} policies, not just stationary ones. The goal is then to bound the \textit{stability index} \[ \Delta := V(f_\star) - V(\tilde{f_\star}) \ge 0, \] which governs how far from optimal the `approximating' policy is in the `real' process. Theorem 1 proves the bound \[ \Delta \le \frac{2 \alpha b}{(1 - \alpha)^2} d_\mathsf{TV}(G, \tilde G), \] where \(d_\mathsf{TV}\) denotes the \textit{total-variation distance}, under certain conditions. The conditions are relatively simple, as are the statement and the proof, making the theorem appealing. However, the authors note a shortcoming: in many situations, \(G\) is continuous and \(\tilde G\) is an empirical distribution, which is inherently discrete, so \(d_\mathsf{TV}(G, \tilde G) = \infty\). Theorem 2 rectifies this issue by obtaining a more complicated bound \[ \Delta \lesssim d_\mathsf{Dudley}(G, \tilde G), \] where \(d_\mathsf{Dudley}\) denotes the \textit{Dudley metric}, under certain conditions, such as requiring the reward function to be Lipschitz. The ``\(\lesssim\)'' symbol hides some factors depending on the rewards function used, such as its Lipschitz constant. The proof of this theorem is much more detailed and nuanced. The example where \(\tilde G\) is an empirical distribution is then exposited: very roughly, if \(G\) has finite exponential moments in an interval around \(0\), then \(d_\mathsf{Dudley}(G, \tilde G_n) \to 0\) as \(n \to \infty\), where \(\tilde G_n\) is the empirical distribution obtained from \(n\) iid samples. Contrastingly, \(d_\mathsf{TV}(G, \tilde G_n) = \infty\) for every \(n\). The assumptions of Theorem 2 are still required, naturally. The paper closes with a few example applications. The first demonstrates the necessity of certain conditions in Theorem 2. The next two are particularly pleasing: the second models dam operations and stocks of water; the third models a controlled environmental stochastic process.
    0 references
    0 references
    optimal control policy
    0 references
    ability inequality
    0 references
    total variation
    0 references
    Dudley metrics
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers