scientific article; zbMATH DE number 1321699

From MaRDI portal
Revision as of 16:51, 6 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:4257216

zbMath0924.68163MaRDI QIDQ4257216

Dimitri P. Bertsekas, John N. Tsitsiklis

Publication date: 9 August 1999


Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.





Related Items (only showing first 100 items - show all)

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storageRandomized Shortest-Path Problems: Two Related ModelsDimension reduction based adaptive dynamic programming for optimal control of discrete-time nonlinear control-affine systemsModel-free algorithm for consensus of discrete-time multi-agent systems using reinforcement learning methodDeep empirical risk minimization in finance: Looking into the futureA Lyapunov characterization of robust policy optimizationAdaptive optimal control of continuous-time nonlinear affine systems via hybrid iteration<scp>Zero‐sum</scp> game optimal control for the nonlinear switched systems based on heuristic dynamic programmingParameter estimation in a 3‐parameter p‐star random graph modelOptimal transmission strategy for multiple Markovian fading channels: existence, structure, and approximationOptimal control of a two‐wheeled self‐balancing robot by reinforcement learningMulti-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learningOptimal output tracking control of linear discrete-time systems with unknown dynamics by adaptive dynamic programming and output feedbackSolving nonlinear and dynamic programming equations on extended \(b\)-metric spaces with the fixed-point techniqueSOS-based policy iteration for H control of polynomial systems with uncertain parametersSolving large-scale dynamic vehicle routing problems with stochastic requestsDynamic parcel pick-up routing problem with prioritized customers and constrained capacity via lower-bound-based rollout approachOptimized ensemble value function approximation for dynamic programmingA reinforcement learning approach to the stochastic cutting stock problemCertified reinforcement learning with logic guidanceReinforcement Learning, Bit by BitA simple illustration of interleaved learning using Kalman filter for linear least squaresTarget Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-LearningEntropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic gamesA stochastic contraction mapping theoremSeparation of learning and control for cyber-physical systemsDistributed consensus-based multi-agent temporal-difference learningOptimal decision-making of mutual fund temporary borrowing problem via approximate dynamic programmingConvergence of gradient algorithms for nonconvex \(C^{1+ \alpha}\) cost functionsState-flipped control and Q-learning for finite horizon output tracking of Boolean control networksPremium control with reinforcement learningEvent-triggered optimal control for discrete-time multi-player non-zero-sum games using parallel controlImproving reinforcement learning algorithms: Towards optimal learning rate policiesPrimal-Dual Regression Approach for Markov Decision Processes with General State and Action Spaces$Q$-Learning in a Stochastic Stackelberg Game between an Uninformed Leader and a Naive FollowerLQG Online LearningRisk-Sensitive Reinforcement LearningREINFORCEMENT LEARNING WITH GOAL-DIRECTED ELIGIBILITY TRACESAsymptotic analysis of temporal-difference learning algorithms with constant step-sizesSome operations research methods for analyzing protein sequences and structuresAsymptotic analysis of temporal-difference learning algorithms with constant step-sizesEvent-triggered integral reinforcement learning for nonzero-sum games with asymmetric input saturationPolicy search for active fault diagnosis with partially observable stateMathematical programming for network revenue management revisitedA sensitivity formula for risk-sensitive cost and the actor-critic algorithmA Relational Hierarchical Model for Decision-Theoretic AssistanceMinimising average passenger waiting time in personal rapid transit systemsValue iteration for LQR control of unknown stochastic-parameter linear systemsAnderson acceleration for partially observable Markov decision processes: a maximum entropy approachDecentralized fused-learner architectures for Bayesian reinforcement learningA Q-learning algorithm for Markov decision processes with continuous state spacesAnalyzing risky choices: Q-learning for deal-no-dealSmall-disturbance input-to-state stability of perturbed gradient flows: applications to LQR problemSum-of-squares-based policy iteration for suboptimal control of polynomial time-varying systemsFinite-horizon Q-learning for discrete-time zero-sum games with application to \(H_{\infty}\) controlConvergence of entropy-regularized natural policy gradient with linear function approximationAccelerated zero-order SGD method for solving the black box optimization problem under ``overparametrization conditionManagement of resource sharing in emergency response using data-driven analyticsIntegral reinforcement learning solutions for a synchronisation system with constrained policiesMaintenance optimization in a digital twin for industry 4.0Nearly optimal fixed time sliding mode controller for leader-follower consensus problem with partially unknown nonlinear agentsValue Enhancement of Reinforcement Learning via Efficient and Robust Trust Region OptimizationMaximizing the probability of visiting a set infinitely often for a Markov decision process with Borel state and action spacesEntropic risk for turn-based stochastic gamesCombining learning and control in linear systemsThe ``black-box optimization problem: zero-order accelerated stochastic method via kernel approximationDeep spatial Q-learning for infectious disease controlPower and delay optimisation in multi-hop wireless networksOn Convergence of Value Iteration for a Class of Total Cost Markov Decision ProcessesEmpirical Q-Value IterationIncremental Quasi-Subgradient Method for Minimizing Sum of Geodesic Quasi-Convex Functions on Riemannian Manifolds with ApplicationsMultiply Accelerated Value Iteration for NonSymmetric Affine Fixed Point Problems and Application to Markov Decision ProcessesReinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraintsApproximation of average cost Markov decision processes using empirical distributions and concentration inequalitiesDistributed Stochastic Optimization with Large DelaysAnalyzing Approximate Value Iteration AlgorithmsSome Limit Properties of Markov Chains Induced by Recursive Stochastic AlgorithmsApproximate policy iteration: a survey and some new methodsA review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applicationsGeneralized maximum entropy estimationAlgorithms for Optimal Control of Stochastic Switching SystemsExpertRNA: A New Framework for RNA Secondary Structure PredictionActor-Critic–Like Stochastic Adaptive Search for Continuous Simulation OptimizationScalable Reinforcement Learning for Multiagent Networked SystemsUnnamed ItemUnnamed ItemStochastic Learning Approach for Binary Optimization: Application to Bayesian Optimal Design of ExperimentsDiscrete-time dynamic graphical games: model-free reinforcement learning solutionComputational Benefits of Intermediate Rewards for Goal-Reaching Policy LearningUnnamed ItemFrom Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic ProgrammingAsymptotics of Reinforcement Learning with Neural NetworksMarkov Reward Models and Markov Decision Processes in Discrete and Continuous Time: Performance Evaluation and OptimizationMultiple-sets split quasi-convex feasibility problems: Adaptive subgradient methods with convergence guaranteeAutomated Reinforcement Learning (AutoRL): A Survey and Open ProblemsFlexible FOND Planning with Explicit Fairness AssumptionsRisk-Sensitive Reinforcement Learning via Policy Gradient SearchDynamic Stochastic Matching Under Limited TimeUnnamed ItemUnnamed Item







This page was built for publication: