Fault-tolerant control based on reinforcement learning and sliding event-triggered mechanism for a class of unknown discrete-time systems (Q6076441)

From MaRDI portal
scientific article; zbMATH DE number 7741191
Language Label Description Also known as
English
Fault-tolerant control based on reinforcement learning and sliding event-triggered mechanism for a class of unknown discrete-time systems
scientific article; zbMATH DE number 7741191

    Statements

    Fault-tolerant control based on reinforcement learning and sliding event-triggered mechanism for a class of unknown discrete-time systems (English)
    0 references
    21 September 2023
    0 references
    A class of discrete-time systems under consideration is described by \[ x(k+1)=f( x(k), \widetilde{x}(k-1), u_f(k), \widetilde{u}_f(k-1) ). \tag{1} \] Here \(f(.)\) is the unknown function, \(x(k)\in R\) is the measured output, \(u_f(k)\in R\) is the control effort or input, \(\widetilde{x}(k-1)=[x(k-1), x(k-2), \dots , x(k-m_x)]\) and \(\widetilde{u}_f(k-1)=[u_f(k-1), u_f(k-2), \dots , u_f(k-m_u)]\). The system orders \(m_x\) and \(m_u\) are unknown. The unknown function \(f(.)\) is assumed to be continuous with respect to the current control effort \(u_f(k)\) over the operating region, despite the presence of an uncertain derivative \(\partial f(.)/ \partial u_f(k) = \xi_f(k)\), which for every \(k\) satisfies the inequality \(\xi_f^m<|\xi_f(k)|<\xi_f^M\) with positive constants \(\xi_f^m\) and \(\xi_f^M\). \textit{The actuator faults} include five common types, i.e. loss of effectiveness fault, additive fault, stuck fault, mixing failure and non-activated fault. The control effort \(u_f(k)\) in (1) is formulated as \[ u_f(k)=\nu_u(k)u(k)+\nu_d(k), \tag{2} \] where \(u(k) \in R\) is the actual control effort of the actuator, \(\nu_u(k)\) is the decreasing control gain and \(\nu_d(k)\) is the additive fault-disturbance. Combinations of three variants, when \(\nu_u(k)=1\), \(0<\nu_u(k)<1\), \(\nu_u(k)=0\), and two variants, when \(\nu_d(k)=0\), \(\nu_d(k)\neq 0\), determine five mentioned typical faults or their absence. To design the optimal controller, \textit{the tracking error} \(e(k)\) is defined as \(e(k)=x_d(k)-x(k)\), where \(x_d(k) \in R\) is the desired trajectory. \textit{The cost function} is \(J(k)=\sum_{i=1}^{\infty}\gamma^ir(k+i)\), where \(\gamma\) is the discount factor and \(r(j)\) for \(j=k+i\) is the unity function defined as \(r(j) = qe^2(j) + pu^2(j)\), where \(q\) and \(p\) are positive constants. By recalling \(J(k)\) , it yields \(J(k) = \gamma r(k + 1) + \gamma J(k + 1)\). \textit{The target of the work} is to establish the control law \(u^*(k)\) corresponding to \(u^*(k) = \arg \min_u \{\gamma r(k + 1) + \gamma J(k + 1)\}\). \textit{The sliding surface} is given by \(s(k) = e(k) + \gamma_se(k-1)\), where \(0<\gamma _s<1\). By considering the sliding condition \(|s(k)|<|s(k-1)|\), it yields the triggered function \(E_{Tr}(k) = |s(k)| - |s(k-1)|\), which is taken negative. Thereafter, \textit{the triggered mechanism} is established by \(k_{i+1} = k_i + \min_{\delta_k} \{\delta_k\in N^+ \rceil\lceil E_{Tr}(k+\delta_k)>\varkappa_{Tr}\}\). Here \(k_i\) \((i = 1, 2,\dots)\) denotes the \(i\)-th event-triggered instant, and \(\varkappa_{Tr}\) is the constant design parameter, which is taken negative to achieve a balance between the reduced data transmission and the favorable closed-loop performance. Then \textit{a critic network} is constructed to estimate the cost function \(J(k)\), and a corresponding learning law is formulated. \textit{The actor network} is established as the direct controller when the input is the tracking error \(e(k)\) and the output is the control effort \(u(k)\). It is prooved in \textit{the central theorem} that, under taken assumptions, the introduced control law together with the triggered mechanism and two learning laws guarantee the convergence of internal signals for a class of unknown discrete-time systems (1) subject to actuator faults described by (2). \textit{The results of computer experiments} are presented to demonstrate that the proposed scheme effectively improves tracking performance under all five typical faults.
    0 references
    reinforcement learning
    0 references
    fault tolerant control
    0 references
    sliding event-trigger
    0 references
    discrete-time systems
    0 references
    fuzzy rules emulated networks
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references