Ergodic and adaptive control of nearest-neighbor motions (Q1176541)

A controlled Markov chain \(X_ n\), \(n=0,1,2,\dots\) on the state space \(S=\{0,1,2,\dots\}\) with transition matrix \(P_{u,\theta}=((p(i,j,u_ i,\theta)))\), \(i,j\in S\) is considered, where \(u=[u_ 1,u_ 2,\dots]\) is the control vector, \(\theta\) is the ``unknown parameter'', here \(u_ i\in D_ i\), \(\theta\in A\), \(D_ i\), \(A\) are compact sets. A sequence \(\{\xi_ n\}\), \(\{\xi_ n=[\xi_ n(0),\xi_ n(1),\dots]\) is a control strategy (CS), if for each \(i\in S\) and \(n\geq 0\), \(P_ \theta(X_{n+1}=i\mid X_ m,\xi_ m,\quad m\leq n)=p(X_ n,i\xi_ n(X_ n),\theta)\); if \(\{\xi_ n\}\) are identically distributed and \(\xi_ n\) is independent of \(X_ m\), \(m\leq n\), \(\xi_ m\), \(m<n\), for each \(n\), that \(\{\xi_ n\}\) is called a stationary randomized strategy (SRS). Under the following assumptions: 1) under any SRS, \(\{X_ n\}\) has a single communicating class \(S\); 2) for each \(i\in S\), there is a finite set \(R_ i\subset S\) such that \(p(i,j,\cdot;\theta)\equiv 0\) for \(j\not\in R_ i\). For \(j\in R_ i\), the quantity \(I\{p(i,j,u;\theta)>0\}\times\ell_ R(p(i,j,u;\theta)/p(i,j,u,\theta_ 0))\) is uniformly bounded in \(i\), \(j\), \(u\), \(\theta\) and continuous in \(\theta\) uniformly in \(i\), \(j\), \(u\); 3) for \(\theta\neq\theta'\) in \(A\), there exists an \(i=i(\theta,\theta')\in S\) such that, for every \(u\in D\), there is a \(j=j(i,u,\theta,\theta')\in S\) with \(p(i,j,u;\theta)\neq p(i,j,u,\theta')\). The self-tuning approach to adaptive control is applied to a class of Markov chains mentioned above, which is called nearest-neighbour motions. For compact parameter and control spaces, the almost-sure optimality of self-tuner for an ergodic cost criterion is established.

0 references

zbMATH Keywords

Markov chain

0 references

control strategy

0 references

stationary randomized strategy

0 references

almost- sure optimality

0 references

ergodic cost criterion

0 references