Optimal switching problem for countable Markov chains: Average reward criterion (Q1396952)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Optimal switching problem for countable Markov chains: Average reward criterion
scientific article

    Statements

    Optimal switching problem for countable Markov chains: Average reward criterion (English)
    0 references
    15 July 2003
    0 references
    The author studies the following generalization of optimal stopping: The prices of a certain commodity are determined by an observable discrete time Markov process \((x_t)\) having a countable state space \(X\) such that all states of \(X\) form one positive recurrent class. If the controller sells (purchases) one unit of the commodity at time \(t\), he obtains the reward \(f(x_t)\) (pays \(g(x_t)\)). It is assumed that at any time epoch he can possess either one unit or \(0\) units of the commodity. A strategy \(({\mathcal T}_s,{\mathcal T}_p)\) consists of two increasing sequences \({\mathcal T}_s: \sigma_1\leq \sigma_2\leq\cdots\) and \({\mathcal T}_p:\tau_1\leq \tau_2\leq\cdots\) of stopping times. Suppose that the controller applies the strategy \(({\mathcal T}_s,{\mathcal T}_p)\). If he possesses initially 1 unit (zero units) of the commodity, his reward functional for the time interval \([0,n]\) is given by \[ J_s({\mathcal T}_s, n)= f(x_{\sigma_1})I\{\sigma_1< n\}- g(x_{\sigma_2}) I\{\sigma_2< n\}+ f(x_{\sigma_3}) I\{\sigma_3< n\}-+\cdots \] and \[ J_p({\mathcal T}_p, n)= -g(x_{\tau_1}) I\{\tau_1< n\}+ f(x_{\tau_2}) I\{\tau_2< n\}- g(x_{\tau_3}) I\{\tau_3< n\}+-\cdots, \] respectively. His expected rewards are \[ V(x,{\mathcal T}_s, n)= E_x J_s({\mathcal T}_s, n),\quad W(x,{\mathcal T}_p, n)= E_x J_p({\mathcal T}_p, n),\qquad x\in X, \] which yield the value functions \[ V(x, n)= \sup_{{\mathcal T}_s} V(x,{\mathcal T}_s, n),\quad W(x,n)= \sup_{{\mathcal T}_p} W(x,{\mathcal T}_p, n),\qquad x\in X \] (the suprema being taken over all increasing sequences of stopping times). A strategy \(({\mathcal T}_s,{\mathcal T}_p)\) is called weak average optimal if for any increasing sequence \({\mathcal T}\) of stopping times and any \(x\in X\) \[ \liminf_{n\to\infty} V(x,{\mathcal T}_s, n)/n\geq \liminf_{n\to\infty} V(x,{\mathcal T},n)/n \] and \[ \liminf_{n\to\infty} W(x,{\mathcal T}_p,n)/n\geq \liminf_{n\to\infty} W(x,{\mathcal T},n)/n. \] \(({\mathcal T}_s,{\mathcal T}_p)\) is called strong average optimal if for any \(x\in X\) \[ \lim_{n\to\infty} V(x,{\mathcal T}_s, n)/n= \lim_{n\to\infty} V(x,n)/n,\quad \lim_{n\to\infty} W(x,{\mathcal T}_p, n)/n= \lim_{n\to\infty} W(x,n)/n. \] The author explicitly obtains (under certain assumptions) strategies which are weak (strong) average optimal.
    0 references
    positive recurrent chain
    0 references
    alternating costs and rewards
    0 references
    stopping times
    0 references
    average criterion
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references