Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems (Q2063829)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems	scientific article

Statements

scholarly article

0 references

Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems (English)

0 references

0 references

0 references

0 references

publication date

3 January 2022

0 references

The paper proposes a bias-policy iteration method for solving the data-driven optimal control problem of unknown continuous-time linear systems, this novel method allowing the condition of the initial admissible controllers to be relaxed. The framework is the study of the design of the optimal controller \(u(t)=u^*(t)\) which not only stabilizes system \[ \dot{x}=Ax+Bu, \] where \(x \in \mathbb{R}^n, u \in \mathbb{R}^m\), the matrix pair \((A,B)\) is assumed to be unknown but stabilizable, but also minimizes the performance index function \(J(x,u)\), i.e. \[ u^*(t)=\arg\min_u J(x,u), \mbox{ with } J(x,u)=\int_{0}^{\infty} \big( x^TQx+u^TRu\big)dt \] where \(Q\geq 0, R>0\) are the weighting matrices such that \((Q,A)\) is assumed to be observable. Based on linear optimal control theory [\textit{D. P. Bertsekas}, Dynamic programming and optimal control. Vol. 1 u. 2. Belmont, MA: Athena Scientific (1995; Zbl 0904.90170)], the optimal controller takes a form involving the unique solution of an algebraic Riccati equation that is usually difficult to solve directly. The paper builts on the idea that adaptive dynamic programming (ADP) technique can be mainly divided into two categories: the policy iteration (PI) method which starts from an initial admissible control policy, and the value iteration (VI) method which starts from an initial proper performance index function. To remove the constraint of the initial admissible controller in traditional PI methods, this paper combines the PI and VI methods, and proposes a policy iteration method called bias-policy iteration (Bias-PI) method to deal with the optimal control of unknown continuous-time systems relaxing the condition of the initial admissible controllers. The proposed method is similar to the \(\lambda\)-PI method in [\textit{D. P. Bertsekas}, Lambda-policy iteration: a review and a new implementation: Lab. Report LIDS-P-2874]. Simulation examples and comparison between the Bias-PI method with some existing results as [\textit{D. Vrabie} et al., Automatica 45, No. 2, 477--484 (2009; Zbl 1158.93354)] are provided.

0 references

0 references

zbMATH Keywords

adaptive dynamic programming

0 references

policy iteration

0 references

unknown systems

0 references

optimal control

0 references

data-driven control

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1016/j.automatica.2021.110058

0 references

Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach

0 references

Model-free \(Q\)-learning designs for linear discrete-time zero-sum games with application to \(H^\infty\) control

0 references

Stability of Stochastic Approximation under Verifiable Conditions

0 references

0 references

Approximate policy iteration: a survey and some new methods

0 references

Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design

0 references

The Riccati equation

0 references

Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems With Unknown Dynamics

0 references

0 references

Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems

0 references

0 references

From model-based control to data-driven control: survey, classification and perspective

0 references

On Model-Free Adaptive Control and Its Stability Analysis

0 references

Cooperative adaptive optimal output regulation of nonlinear discrete-time multi-agent systems

0 references

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics

0 references

Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems

0 references

0 references

0 references

Adaptive Optimal Control of Linear Periodic Systems: An Off-Policy Value Iteration Approach

0 references

Approximate Dynamic Programming

0 references

0 references

Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach

0 references

Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

0 references

Adaptive optimal control for continuous-time linear systems based on policy iteration

0 references

Dynamic intermittent <i>Q</i>‐learning–based model‐free suboptimal co‐design of ‐stabilization

0 references

A Parametric Lyapunov Equation Approach to the Design of Low Gain Feedback

0 references

A parametric Lyapunov equation approach to low gain feedback design for discrete-time systems

0 references

Identifiers

zbMATH Open document ID

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

10.1016/J.AUTOMATICA.2021.110058

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2063829

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2063829&oldid=38858798"