Optimal switching problem for countable Markov chains: Average reward criterion (Q1396952)

scientific article

Language	Label	Description	Also known as
English	Optimal switching problem for countable Markov chains: Average reward criterion	scientific article

Statements

instance of

scholarly article

0 references

title

Optimal switching problem for countable Markov chains: Average reward criterion (English)

0 references

published in

Mathematical Methods of Operations Research

0 references

publication date

15 July 2003

0 references

review text

The author studies the following generalization of optimal stopping: The prices of a certain commodity are determined by an observable discrete time Markov process \((x_t)\) having a countable state space \(X\) such that all states of \(X\) form one positive recurrent class. If the controller sells (purchases) one unit of the commodity at time \(t\), he obtains the reward \(f(x_t)\) (pays \(g(x_t)\)). It is assumed that at any time epoch he can possess either one unit or \(0\) units of the commodity. A strategy \(({\mathcal T}_s,{\mathcal T}_p)\) consists of two increasing sequences \({\mathcal T}_s: \sigma_1\leq \sigma_2\leq\cdots\) and \({\mathcal T}_p:\tau_1\leq \tau_2\leq\cdots\) of stopping times. Suppose that the controller applies the strategy \(({\mathcal T}_s,{\mathcal T}_p)\). If he possesses initially 1 unit (zero units) of the commodity, his reward functional for the time interval \([0,n]\) is given by \[ J_s({\mathcal T}_s, n)= f(x_{\sigma_1})I\{\sigma_1< n\}- g(x_{\sigma_2}) I\{\sigma_2< n\}+ f(x_{\sigma_3}) I\{\sigma_3< n\}-+\cdots \] and \[ J_p({\mathcal T}_p, n)= -g(x_{\tau_1}) I\{\tau_1< n\}+ f(x_{\tau_2}) I\{\tau_2< n\}- g(x_{\tau_3}) I\{\tau_3< n\}+-\cdots, \] respectively. His expected rewards are \[ V(x,{\mathcal T}_s, n)= E_x J_s({\mathcal T}_s, n),\quad W(x,{\mathcal T}_p, n)= E_x J_p({\mathcal T}_p, n),\qquad x\in X, \] which yield the value functions \[ V(x, n)= \sup_{{\mathcal T}_s} V(x,{\mathcal T}_s, n),\quad W(x,n)= \sup_{{\mathcal T}_p} W(x,{\mathcal T}_p, n),\qquad x\in X \] (the suprema being taken over all increasing sequences of stopping times). A strategy \(({\mathcal T}_s,{\mathcal T}_p)\) is called weak average optimal if for any increasing sequence \({\mathcal T}\) of stopping times and any \(x\in X\) \[ \liminf_{n\to\infty} V(x,{\mathcal T}_s, n)/n\geq \liminf_{n\to\infty} V(x,{\mathcal T},n)/n \] and \[ \liminf_{n\to\infty} W(x,{\mathcal T}_p,n)/n\geq \liminf_{n\to\infty} W(x,{\mathcal T},n)/n. \] \(({\mathcal T}_s,{\mathcal T}_p)\) is called strong average optimal if for any \(x\in X\) \[ \lim_{n\to\infty} V(x,{\mathcal T}_s, n)/n= \lim_{n\to\infty} V(x,n)/n,\quad \lim_{n\to\infty} W(x,{\mathcal T}_p, n)/n= \lim_{n\to\infty} W(x,n)/n. \] The author explicitly obtains (under certain assumptions) strategies which are weak (strong) average optimal.

0 references

zbMATH Keywords

positive recurrent chain

0 references

alternating costs and rewards

0 references

stopping times

0 references

average criterion

0 references

author

Alexander A. Yushkevich