Elementary results on solutions to the Bellman equation of dynamic programming: existence, uniqueness, and convergence (Q2249573)

From MaRDI portal
Revision as of 05:41, 2 August 2023 by Importer (talk | contribs) (‎Created a new Item)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
scientific article
Language Label Description Also known as
English
Elementary results on solutions to the Bellman equation of dynamic programming: existence, uniqueness, and convergence
scientific article

    Statements

    Elementary results on solutions to the Bellman equation of dynamic programming: existence, uniqueness, and convergence (English)
    0 references
    2 July 2014
    0 references
    The paper studies a discrete time infinite horizon dynamic optimization problem. The main aim is to show the existence of an unique solution to the resulting Bellman equation without making topological assumptions, such as continuity of the transition correspondence and the return function. This solution is shown to be the \textit{value function} of the problem which can be found by iteration of the \textit{Bellman operator}, starting from an appropriate initial condition, asymptotically by pointwise convergence. Let \(\Gamma\) \(:X\rightarrow X\) be a nonempty valued correspondence defined on the state space \(X\). \(\Gamma \) is the \textit{transition function}. Let \(D\) denote the graph of \(\Gamma\) and \(u\) denote the \textit{return function}, where \(u:D\rightarrow[-\infty,\infty)\). Let \(\{x_t\}_{t=0}^\infty\) denote an infinite sequence of points in the set \(X\). It is a \textit{feasible path} from \(x\in X\) if for all \(t\geq 0\), \(x_{t+1}\in \Gamma(x_t)\) and \(x_0=x\). Denote by \(\Pi\) the set of all possible feasible paths starting from some initial condition \(x\in X\). Let \(\Pi(x_0)\) denote the set of all feasible paths from a particular \(x_0\in X\). Let \(\beta \geq 0\) denote the discount factor. Given \(x_{0}\in X\), consider the following optimization problem where \(L\) is either \(\liminf\) or it is \(\limsup\): \[ \sup_{\{x_{t}\}_{t=0}^{\infty }\in\Pi(x_0)}L_{T\uparrow\infty}\sum_{t=0}^T\beta^tu(x_t,x_{t+1}) \] Define the function \(S\) over \(\Pi\) as follows: for \(\{x_{t}\}_{t=0}^\infty\in\Pi\), \(S(\{x_t\}_{t=0}^\infty)= L_{T\uparrow\infty}\sum_{t=0}^T\beta^tu(x_t,x_{t+1})\). The value function \(v^\ast\) over \(X\) is defined by: for any \(x_0\in X\), \(v^\ast(x_0)=\sup_{\{x_t\}_{t=0}^\infty\in\Pi(x_0)}S(\{x_t\}_{t=0}^\infty)\). Denote, respectively, by \(\Pi^0\) and \(\Pi^0(x_0)\) the subsets of \(\Pi\) and \(\Pi(x_0)\) over which the value of \(S\) is \(>-\infty\). Let \(V\) denote the set of functions from \(X\) to \([-\infty,\infty)\). The Bellman operator \(B\) on \(V\) is defined by \[ (Bv)(x)=\sup_{y\in\Gamma(x)}\{u(x,y)+\beta v(y)\},\quad x\in X,\, v\in V \] If \(v,w\) are functions defined over the same domain with range in \([-\infty,\infty)\), the partial order \(v\leq w\) is understood to mean that for each point \(x\) in the domain, \(v(x)\leq w(x)\). For \(v\leq w\), the \textit{order interval} \([v,w]=\{f\in V:v\leq f\leq w\}\). The main Theorem is as follows: Suppose that there exist \(\underline{v}\), \(\overline{v}\in V\) such that: \(\underline{v}\leq \overline{v};\) \(B\underline{v}\geq \underline{v}\); \(B\overline{v}\leq \overline{v}\) and, moreover, (i) for all \(\{x_t\}_{t=0}^\infty\in \Pi ^0,\liminf_{t\uparrow \infty }\beta^t\underline{v}(x_t)\geq 0\), (ii) for all \(\{x_t\}_{t=0}^\infty\in\Pi\), \(\limsup_{t\uparrow \infty }\beta^t\overline{v}(x_t)\leq 0\) then \(B\) has an unique fixed point in \([\underline{v}\), \(\overline{v}]\) which is the value function \(v^\ast\) and, furthermore, the increasing sequence \(\{B^n\underline{v}\}_{n=1}^\infty\) converges to \(v^\ast\) pointwise. The conditions (i) and (ii) referred to in the hypothesis are similar to the familiar transversality conditions in the literature. The author uses the Knaster-Tarski fixed point theorem instead of a variant of the contraction mapping theorem as is customary in the literature and imposes a minimal set of conditions to get the result. This turns out to avoid the imposition of any topological assumptions, as was noted earlier. Because no topological assumptions are used, the result applies to scenarios where discontinuities may be present. Some examples are discussed as well as some further implications of the method of proof employed.
    0 references
    0 references
    dynamic programming
    0 references
    Bellman equation
    0 references
    value function
    0 references
    fixed point
    0 references
    iterative convergence
    0 references