Optimal switching problem for countable Markov chains: Average reward criterion (Q1396952)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Optimal switching problem for countable Markov chains: Average reward criterion |
scientific article |
Statements
Optimal switching problem for countable Markov chains: Average reward criterion (English)
0 references
15 July 2003
0 references
The author studies the following generalization of optimal stopping: The prices of a certain commodity are determined by an observable discrete time Markov process \((x_t)\) having a countable state space \(X\) such that all states of \(X\) form one positive recurrent class. If the controller sells (purchases) one unit of the commodity at time \(t\), he obtains the reward \(f(x_t)\) (pays \(g(x_t)\)). It is assumed that at any time epoch he can possess either one unit or \(0\) units of the commodity. A strategy \(({\mathcal T}_s,{\mathcal T}_p)\) consists of two increasing sequences \({\mathcal T}_s: \sigma_1\leq \sigma_2\leq\cdots\) and \({\mathcal T}_p:\tau_1\leq \tau_2\leq\cdots\) of stopping times. Suppose that the controller applies the strategy \(({\mathcal T}_s,{\mathcal T}_p)\). If he possesses initially 1 unit (zero units) of the commodity, his reward functional for the time interval \([0,n]\) is given by \[ J_s({\mathcal T}_s, n)= f(x_{\sigma_1})I\{\sigma_1< n\}- g(x_{\sigma_2}) I\{\sigma_2< n\}+ f(x_{\sigma_3}) I\{\sigma_3< n\}-+\cdots \] and \[ J_p({\mathcal T}_p, n)= -g(x_{\tau_1}) I\{\tau_1< n\}+ f(x_{\tau_2}) I\{\tau_2< n\}- g(x_{\tau_3}) I\{\tau_3< n\}+-\cdots, \] respectively. His expected rewards are \[ V(x,{\mathcal T}_s, n)= E_x J_s({\mathcal T}_s, n),\quad W(x,{\mathcal T}_p, n)= E_x J_p({\mathcal T}_p, n),\qquad x\in X, \] which yield the value functions \[ V(x, n)= \sup_{{\mathcal T}_s} V(x,{\mathcal T}_s, n),\quad W(x,n)= \sup_{{\mathcal T}_p} W(x,{\mathcal T}_p, n),\qquad x\in X \] (the suprema being taken over all increasing sequences of stopping times). A strategy \(({\mathcal T}_s,{\mathcal T}_p)\) is called weak average optimal if for any increasing sequence \({\mathcal T}\) of stopping times and any \(x\in X\) \[ \liminf_{n\to\infty} V(x,{\mathcal T}_s, n)/n\geq \liminf_{n\to\infty} V(x,{\mathcal T},n)/n \] and \[ \liminf_{n\to\infty} W(x,{\mathcal T}_p,n)/n\geq \liminf_{n\to\infty} W(x,{\mathcal T},n)/n. \] \(({\mathcal T}_s,{\mathcal T}_p)\) is called strong average optimal if for any \(x\in X\) \[ \lim_{n\to\infty} V(x,{\mathcal T}_s, n)/n= \lim_{n\to\infty} V(x,n)/n,\quad \lim_{n\to\infty} W(x,{\mathcal T}_p, n)/n= \lim_{n\to\infty} W(x,n)/n. \] The author explicitly obtains (under certain assumptions) strategies which are weak (strong) average optimal.
0 references
positive recurrent chain
0 references
alternating costs and rewards
0 references
stopping times
0 references
average criterion
0 references