Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems (Q2063829)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems
scientific article

    Statements

    Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems (English)
    0 references
    0 references
    0 references
    0 references
    3 January 2022
    0 references
    The paper proposes a bias-policy iteration method for solving the data-driven optimal control problem of unknown continuous-time linear systems, this novel method allowing the condition of the initial admissible controllers to be relaxed. The framework is the study of the design of the optimal controller \(u(t)=u^*(t)\) which not only stabilizes system \[ \dot{x}=Ax+Bu, \] where \(x \in \mathbb{R}^n, u \in \mathbb{R}^m\), the matrix pair \((A,B)\) is assumed to be unknown but stabilizable, but also minimizes the performance index function \(J(x,u)\), i.e. \[ u^*(t)=\arg\min_u J(x,u), \mbox{ with } J(x,u)=\int_{0}^{\infty} \big( x^TQx+u^TRu\big)dt \] where \(Q\geq 0, R>0\) are the weighting matrices such that \((Q,A)\) is assumed to be observable. Based on linear optimal control theory [\textit{D. P. Bertsekas}, Dynamic programming and optimal control. Vol. 1 u. 2. Belmont, MA: Athena Scientific (1995; Zbl 0904.90170)], the optimal controller takes a form involving the unique solution of an algebraic Riccati equation that is usually difficult to solve directly. The paper builts on the idea that adaptive dynamic programming (ADP) technique can be mainly divided into two categories: the policy iteration (PI) method which starts from an initial admissible control policy, and the value iteration (VI) method which starts from an initial proper performance index function. To remove the constraint of the initial admissible controller in traditional PI methods, this paper combines the PI and VI methods, and proposes a policy iteration method called bias-policy iteration (Bias-PI) method to deal with the optimal control of unknown continuous-time systems relaxing the condition of the initial admissible controllers. The proposed method is similar to the \(\lambda\)-PI method in [\textit{D. P. Bertsekas}, Lambda-policy iteration: a review and a new implementation: Lab. Report LIDS-P-2874]. Simulation examples and comparison between the Bias-PI method with some existing results as [\textit{D. Vrabie} et al., Automatica 45, No. 2, 477--484 (2009; Zbl 1158.93354)] are provided.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    adaptive dynamic programming
    0 references
    policy iteration
    0 references
    unknown systems
    0 references
    optimal control
    0 references
    data-driven control
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references