scientific article; zbMATH DE number 1321699
From MaRDI portal
Publication:4257216
zbMath0924.68163MaRDI QIDQ4257216
Dimitri P. Bertsekas, John N. Tsitsiklis
Publication date: 9 August 1999
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science (68-01) Learning and adaptive systems in artificial intelligence (68T05)
Related Items (only showing first 100 items - show all)
Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage ⋮ Randomized Shortest-Path Problems: Two Related Models ⋮ Dimension reduction based adaptive dynamic programming for optimal control of discrete-time nonlinear control-affine systems ⋮ Model-free algorithm for consensus of discrete-time multi-agent systems using reinforcement learning method ⋮ Deep empirical risk minimization in finance: Looking into the future ⋮ A Lyapunov characterization of robust policy optimization ⋮ Adaptive optimal control of continuous-time nonlinear affine systems via hybrid iteration ⋮ <scp>Zero‐sum</scp> game optimal control for the nonlinear switched systems based on heuristic dynamic programming ⋮ Parameter estimation in a 3‐parameter p‐star random graph model ⋮ Optimal transmission strategy for multiple Markovian fading channels: existence, structure, and approximation ⋮ Optimal control of a two‐wheeled self‐balancing robot by reinforcement learning ⋮ Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning ⋮ Optimal output tracking control of linear discrete-time systems with unknown dynamics by adaptive dynamic programming and output feedback ⋮ Solving nonlinear and dynamic programming equations on extended \(b\)-metric spaces with the fixed-point technique ⋮ SOS-based policy iteration for H∞ control of polynomial systems with uncertain parameters ⋮ Solving large-scale dynamic vehicle routing problems with stochastic requests ⋮ Dynamic parcel pick-up routing problem with prioritized customers and constrained capacity via lower-bound-based rollout approach ⋮ Optimized ensemble value function approximation for dynamic programming ⋮ A reinforcement learning approach to the stochastic cutting stock problem ⋮ Certified reinforcement learning with logic guidance ⋮ Reinforcement Learning, Bit by Bit ⋮ A simple illustration of interleaved learning using Kalman filter for linear least squares ⋮ Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning ⋮ Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games ⋮ A stochastic contraction mapping theorem ⋮ Separation of learning and control for cyber-physical systems ⋮ Distributed consensus-based multi-agent temporal-difference learning ⋮ Optimal decision-making of mutual fund temporary borrowing problem via approximate dynamic programming ⋮ Convergence of gradient algorithms for nonconvex \(C^{1+ \alpha}\) cost functions ⋮ State-flipped control and Q-learning for finite horizon output tracking of Boolean control networks ⋮ Premium control with reinforcement learning ⋮ Event-triggered optimal control for discrete-time multi-player non-zero-sum games using parallel control ⋮ Improving reinforcement learning algorithms: Towards optimal learning rate policies ⋮ Primal-Dual Regression Approach for Markov Decision Processes with General State and Action Spaces ⋮ $Q$-Learning in a Stochastic Stackelberg Game between an Uninformed Leader and a Naive Follower ⋮ LQG Online Learning ⋮ Risk-Sensitive Reinforcement Learning ⋮ REINFORCEMENT LEARNING WITH GOAL-DIRECTED ELIGIBILITY TRACES ⋮ Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes ⋮ Some operations research methods for analyzing protein sequences and structures ⋮ Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes ⋮ Event-triggered integral reinforcement learning for nonzero-sum games with asymmetric input saturation ⋮ Policy search for active fault diagnosis with partially observable state ⋮ Mathematical programming for network revenue management revisited ⋮ A sensitivity formula for risk-sensitive cost and the actor-critic algorithm ⋮ A Relational Hierarchical Model for Decision-Theoretic Assistance ⋮ Minimising average passenger waiting time in personal rapid transit systems ⋮ Value iteration for LQR control of unknown stochastic-parameter linear systems ⋮ Anderson acceleration for partially observable Markov decision processes: a maximum entropy approach ⋮ Decentralized fused-learner architectures for Bayesian reinforcement learning ⋮ A Q-learning algorithm for Markov decision processes with continuous state spaces ⋮ Analyzing risky choices: Q-learning for deal-no-deal ⋮ Small-disturbance input-to-state stability of perturbed gradient flows: applications to LQR problem ⋮ Sum-of-squares-based policy iteration for suboptimal control of polynomial time-varying systems ⋮ Finite-horizon Q-learning for discrete-time zero-sum games with application to \(H_{\infty}\) control ⋮ Convergence of entropy-regularized natural policy gradient with linear function approximation ⋮ Accelerated zero-order SGD method for solving the black box optimization problem under ``overparametrization condition ⋮ Management of resource sharing in emergency response using data-driven analytics ⋮ Integral reinforcement learning solutions for a synchronisation system with constrained policies ⋮ Maintenance optimization in a digital twin for industry 4.0 ⋮ Nearly optimal fixed time sliding mode controller for leader-follower consensus problem with partially unknown nonlinear agents ⋮ Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization ⋮ Maximizing the probability of visiting a set infinitely often for a Markov decision process with Borel state and action spaces ⋮ Entropic risk for turn-based stochastic games ⋮ Combining learning and control in linear systems ⋮ The ``black-box optimization problem: zero-order accelerated stochastic method via kernel approximation ⋮ Deep spatial Q-learning for infectious disease control ⋮ Power and delay optimisation in multi-hop wireless networks ⋮ On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes ⋮ Empirical Q-Value Iteration ⋮ Incremental Quasi-Subgradient Method for Minimizing Sum of Geodesic Quasi-Convex Functions on Riemannian Manifolds with Applications ⋮ Multiply Accelerated Value Iteration for NonSymmetric Affine Fixed Point Problems and Application to Markov Decision Processes ⋮ Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints ⋮ Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities ⋮ Distributed Stochastic Optimization with Large Delays ⋮ Analyzing Approximate Value Iteration Algorithms ⋮ Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ Approximate policy iteration: a survey and some new methods ⋮ A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications ⋮ Generalized maximum entropy estimation ⋮ Algorithms for Optimal Control of Stochastic Switching Systems ⋮ ExpertRNA: A New Framework for RNA Secondary Structure Prediction ⋮ Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization ⋮ Scalable Reinforcement Learning for Multiagent Networked Systems ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Stochastic Learning Approach for Binary Optimization: Application to Bayesian Optimal Design of Experiments ⋮ Discrete-time dynamic graphical games: model-free reinforcement learning solution ⋮ Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning ⋮ Unnamed Item ⋮ From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming ⋮ Asymptotics of Reinforcement Learning with Neural Networks ⋮ Markov Reward Models and Markov Decision Processes in Discrete and Continuous Time: Performance Evaluation and Optimization ⋮ Multiple-sets split quasi-convex feasibility problems: Adaptive subgradient methods with convergence guarantee ⋮ Automated Reinforcement Learning (AutoRL): A Survey and Open Problems ⋮ Flexible FOND Planning with Explicit Fairness Assumptions ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ Dynamic Stochastic Matching Under Limited Time ⋮ Unnamed Item ⋮ Unnamed Item
This page was built for publication: