scientific article; zbMATH DE number 1321699

From MaRDI portal
Publication:4257216

zbMath0924.68163MaRDI QIDQ4257216

Dimitri P. Bertsekas, John N. Tsitsiklis

Publication date: 9 August 1999


Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.



Related Items

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, Randomized Shortest-Path Problems: Two Related Models, Dimension reduction based adaptive dynamic programming for optimal control of discrete-time nonlinear control-affine systems, Model-free algorithm for consensus of discrete-time multi-agent systems using reinforcement learning method, Deep empirical risk minimization in finance: Looking into the future, A Lyapunov characterization of robust policy optimization, Adaptive optimal control of continuous-time nonlinear affine systems via hybrid iteration, <scp>Zero‐sum</scp> game optimal control for the nonlinear switched systems based on heuristic dynamic programming, Parameter estimation in a 3‐parameter p‐star random graph model, Optimal transmission strategy for multiple Markovian fading channels: existence, structure, and approximation, Optimal control of a two‐wheeled self‐balancing robot by reinforcement learning, Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning, Optimal output tracking control of linear discrete-time systems with unknown dynamics by adaptive dynamic programming and output feedback, Solving nonlinear and dynamic programming equations on extended \(b\)-metric spaces with the fixed-point technique, SOS-based policy iteration for H control of polynomial systems with uncertain parameters, Solving large-scale dynamic vehicle routing problems with stochastic requests, Dynamic parcel pick-up routing problem with prioritized customers and constrained capacity via lower-bound-based rollout approach, Optimized ensemble value function approximation for dynamic programming, A reinforcement learning approach to the stochastic cutting stock problem, Certified reinforcement learning with logic guidance, Reinforcement Learning, Bit by Bit, A simple illustration of interleaved learning using Kalman filter for linear least squares, Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning, Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games, A stochastic contraction mapping theorem, Separation of learning and control for cyber-physical systems, Distributed consensus-based multi-agent temporal-difference learning, Optimal decision-making of mutual fund temporary borrowing problem via approximate dynamic programming, Convergence of gradient algorithms for nonconvex \(C^{1+ \alpha}\) cost functions, State-flipped control and Q-learning for finite horizon output tracking of Boolean control networks, Premium control with reinforcement learning, Event-triggered optimal control for discrete-time multi-player non-zero-sum games using parallel control, Improving reinforcement learning algorithms: Towards optimal learning rate policies, Primal-Dual Regression Approach for Markov Decision Processes with General State and Action Spaces, $Q$-Learning in a Stochastic Stackelberg Game between an Uninformed Leader and a Naive Follower, LQG Online Learning, Risk-Sensitive Reinforcement Learning, REINFORCEMENT LEARNING WITH GOAL-DIRECTED ELIGIBILITY TRACES, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Some operations research methods for analyzing protein sequences and structures, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Mathematical programming for network revenue management revisited, A sensitivity formula for risk-sensitive cost and the actor-critic algorithm, A Relational Hierarchical Model for Decision-Theoretic Assistance, Minimising average passenger waiting time in personal rapid transit systems, Power and delay optimisation in multi-hop wireless networks, On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes, Empirical Q-Value Iteration, Incremental Quasi-Subgradient Method for Minimizing Sum of Geodesic Quasi-Convex Functions on Riemannian Manifolds with Applications, Multiply Accelerated Value Iteration for NonSymmetric Affine Fixed Point Problems and Application to Markov Decision Processes, Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints, Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities, Distributed Stochastic Optimization with Large Delays, Analyzing Approximate Value Iteration Algorithms, Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization, Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems, A constrained optimization perspective on actor-critic algorithms and application to network routing, Potential-based least-squares policy iteration for a parameterized feedback control system, Adaptive importance sampling for control and inference, The factored policy-gradient planner, Practical solution techniques for first-order MDPs, Approximate dynamic programming for stochastic linear control problems on compact state spaces, Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm, Computable approximations for continuous-time Markov decision processes on Borel spaces based on empirical measures, Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming, Multiscale Q-learning with linear function approximation, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Strategy optimization for controlled Markov process with descriptive complexity constraint, Modeling and optimization control of a demand-driven, conveyor-serviced production station, Solving stochastic resource-constrained project scheduling problems by closed-loop approximate dynamic programming, On solving the Lagrangian dual of integer programs via an incremental approach, New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system, Neural network and regression spline value function approximations for stochastic dynamic programming, Approximate dynamic programming for the dispatch of military medical evacuation assets, A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces, Perspectives of approximate dynamic programming, Low-discrepancy sampling for approximate dynamic programming with local approximators, Efficient model-based reinforcement learning for approximate online optimal control, Planning for multiple measurement channels in a continuous-state POMDP, An approximate dynamic programming framework for modeling global climate policy under decision-dependent uncertainty, Q-learning and policy iteration algorithms for stochastic shortest path problems, Model-free \(H_{\infty }\) control design for unknown linear discrete-time systems via Q-learning with LMI, General time consistent discounting, Suboptimal solutions to dynamic optimization problems via approximations of the policy functions, Performance evaluation of direct heuristic dynamic programming using control-theoretic measures, Robust adaptive dynamic programming for linear and nonlinear systems: an overview, An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes, Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model, Moneybarl: exploiting pitcher decision-making using reinforcement learning, A comparison of global and semi-local approximation in \(T\)-stage stochastic optimization, Ranking policies in discrete Markov decision processes, Minimum and worst-case performance ratios of rollout algorithms, The optimal unbiased value estimator and its relation to LSTD, TD and MC, Model selection in reinforcement learning, Finding optimal memoryless policies of POMDPs under the expected average reward criterion, Optimal control as a graphical model inference problem, The optimal control of just-in-time-based production and distribution systems and performance comparisons with optimized pull systems, Network revenue management with inventory-sensitive bid prices and customer choice, Minimizing total tardiness in a stochastic single machine scheduling problem using approximate dynamic programming, A framework and a mean-field algorithm for the local control of spatial processes, Potentials based optimization with embedded Markov chain for stochastic constrained system, An online actor-critic algorithm with function approximation for constrained Markov decision processes, Approximate dynamic programming for capacity allocation in the service industry, Performance optimization of queueing systems with perturbation realization, Fitting piecewise linear continuous functions, Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations, Asymptotic analysis of value prediction by well-specified and misspecified models, Incremental proximal methods for large scale convex optimization, Iterative methods for the solution of a singular control formulation of a GMWB pricing problem, A dynamic programming strategy to balance exploration and exploitation in the bandit problem, Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning, Value set iteration for Markov decision processes, Decentralized MDPs with sparse interactions, Depth-based short-sighted stochastic shortest path problems, A tutorial on event-based optimization -- a new optimization framework, Control: a perspective, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Reinforcement \(Q\)-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, A sparse collocation method for solving time-dependent HJB equations using multivariate \(B\)-splines, Temporal difference-based policy iteration for optimal control of stochastic systems, Newton-based stochastic optimization using \(q\)-Gaussian smoothed functional algorithms, Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, Influence of temporal aggregation on strategic forest management under risk of wind damage, Approximation of Markov decision processes with general state space, Resource allocation in congested queueing systems with time-varying demand: an application to airport operations, Generalized decision rule approximations for stochastic programming via liftings, Data-based analysis of discrete-time linear systems in noisy environment: controllability and observability, Optimal energy allocation for linear control with packet loss under energy harvesting constraints, Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance, Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach, Optimal switching with minimum dwell time constraint, Control of multistability, \(\mathrm{H}_\infty\) control of linear discrete-time systems: off-policy reinforcement learning, Iteration complexity analysis of block coordinate descent methods, Four encounters with system identification, Integral \(Q\)-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems, Hessian matrix distribution for Bayesian policy gradient reinforcement learning, Sampled fictitious play for approximate dynamic programming, Maximizing the probability of attaining a target prior to extinction, Convergence analysis of online gradient method for BP neural networks, The Borkar-Meyn theorem for asynchronous stochastic approximations, A generic architecture for adaptive agents based on reinforcement learning, Management of water resource systems in the presence of uncertainties by nonlinear approximation techniques and deterministic sampling, Approximate dynamic programming via direct search in the space of value function approximations, Robust inversion, dimensionality reduction, and randomized sampling, A frequentist approach to mapping under uncertainty, Proximal algorithms and temporal difference methods for solving fixed point problems, Stochastic decomposition applied to large-scale hydro valleys management, Discrete-time gradient flows and law of large numbers in Alexandrov spaces, Learning with policy prediction in continuous state-action multi-agent decision processes, Strong law of large numbers for the \(L^1\)-Karcher mean, Parallelization strategies for rollout algorithms, Optimal cost almost-sure reachability in POMDPs, Solving factored MDPs using non-homogeneous partitions, A time aggregation approach to Markov decision processes, Ambiguous partially observable Markov decision processes: structural results and applications, An algorithmic approach to optimal asset liquidation problems, Joint optimization of ordering and maintenance with condition monitoring data, Active network management for electrical distribution systems: problem formulation, benchmark, and approximate solution, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming, Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning, Approximate policy optimization and adaptive control in regression models, Actor-critic algorithms for hierarchical Markov decision processes, Faster rollout search for the vehicle routing problem with stochastic demands and restocking, A policy gradient method for semi-Markov decision processes with application to call admission control, Envelope condition method with an application to default risk models, Evaluation of counterparty risk for derivatives with early-exercise features, An integrated data-driven Markov parameters sequence identification and adaptive dynamic programming method to design fault-tolerant optimal tracking control for completely unknown model systems, General value iteration based single network approach for constrained optimal controller design of partially-unknown continuous-time nonlinear systems, Open problems in universal induction \& intelligence, Symmetric approximate linear programming for factored MDPs with application to constrained problems, The emergence of goals in a self-organizing network: a non-mentalist model of intentional actions, Model-free \(Q\)-learning designs for linear discrete-time zero-sum games with application to \(H^\infty\) control, Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment, New stochastic approximation algorithms with adaptive step sizes, Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming, Extremum seeking of dynamical systems via gradient descent and stochastic approximation methods, Energy contracts management by stochastic programming techniques, Immediate return preference emerged from a synaptic learning rule for return maximization, Variance-constrained actor-critic algorithms for discounted and average reward MDPs, Complete stability analysis of a heuristic approximate dynamic programming control design, Approximate stochastic annealing for online control of infinite horizon Markov decision processes, A reinforcement learning approach to convoy scheduling on a contested transportation network, Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Real-time dynamic programming for Markov decision processes with imprecise probabilities, A rollout algorithm framework for heuristic solutions to finite-horizon stochastic dynamic programs, Reinforcement learning algorithms with function approximation: recent advances and applications, Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains, Asymptotic bias of stochastic gradient search, Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results, Approximate receding horizon approach for Markov decision processes: average reward case, Online stochastic optimization under time constraints, Modeling and optimization of M/G/1-type queueing networks: an efficient sensitivity analysis approach, Training parsers by inverse reinforcement learning, On finding global optima for the hinge fitting problem., A unified framework for stochastic optimization, Reinforcement learning for long-run average cost., Approximate dynamic programming for link scheduling in wireless mesh networks, Convergent multiple-timescales reinforcement learning algorithms in normal form games, Sensitivity-based nested partitions for solving finite-horizon Markov decision processes, Linear programming formulation for non-stationary, finite-horizon Markov decision process models, Distributed adaptive dynamic programming for data-driven optimal control, Shape constraints in economics and operations research, Approximation of discounted minimax Markov control problems and zero-sum Markov games using Hausdorff and Wasserstein distances, Finite-horizon optimal control of discrete-time linear systems with completely unknown dynamics using Q-learning, Variance minimization of parameterized Markov decision processes, Planning horizons based proactive rescheduling for stochastic resource-constrained project scheduling problems, Dynamic pricing for vehicle ferries: using packing and simulation to optimize revenues, Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games, Approximate dynamic programming for missile defense interceptor fire control, Deep reinforcement learning with temporal logics, Discovering hidden structure in factored MDPs, Learning output reference model tracking for higher-order nonlinear systems with unknown dynamics, Solving variational inequality and fixed point problems by line searches and potential optimization, Incremental quasi-subgradient methods for minimizing the sum of quasi-convex functions, Synergies of operations research and data mining, Optimally maintaining a Markovian deteriorating system with limited imperfect repairs, Heterogeneous trading strategies with adaptive fuzzy actor-critic reinforcement learning: a behavioral approach, Stochastic dynamic programming applied to hydrothermal power systems operation planning based on the convex hull algorithm, Approximate dynamic programming with a fuzzy parameterization, Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, On solving integral equations using Markov chain Monte Carlo methods, Distributed resource allocation with binary decisions via Newton-like neural network dynamics, Adaptive importance sampling for value function approximation in off-policy reinforcement learning, Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming, Applications of stochastic modeling in air traffic management: methods, challenges and opportunities for solving air traffic problems under uncertainty, Optimal DoS attack scheduling for multi-sensor remote state estimation over interference channels, TT-QI: faster value iteration in tensor train format for stochastic optimal control, Off-policy temporal difference learning with distribution adaptation in fast mixing chains, Multi-period portfolio optimization with linear control policies, A formal framework and extensions for function approximation in learning classifier systems, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Convergence analysis of batch gradient algorithm for three classes of sigma-pi neural networks, Projected equation methods for approximate solution of large linear systems, Adaptive optimal control for continuous-time linear systems based on policy iteration, Analysis of a class of dynamic programming models for multi-stage uncertain systems, A stochastic gradient type algorithm for closed-loop problems, Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling, An approximate dynamic programming approach for the vehicle routing problem with stochastic demands, Bond management and max-min optimal control., Application of orthogonal arrays and MARS to inventory forecasting stochastic dynamic programs., Reinforcement distribution in fuzzy Q-learning, Pricing substitutable flights in airline revenue management, Resource-constrained management of heterogeneous assets with stochastic deterioration, Theoretical tools for understanding and aiding dynamic decision making, Reinforcement learning in the brain, Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands, Limitations of learning in automata-based systems, A maxmin policy for bond management, Stochastic dynamic programming with factored representations, Bounded-parameter Markov decision processes, Natural actor-critic algorithms, Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem, Monte Carlo \(TD(\lambda)\)-methods for the optimal control of discrete-time Markovian jump linear systems, Variable demand and multi-commodity flow in Markovian network equilibrium, Feasible methods for nonconvex nonsmooth problems with applications in green communications, Stochastic quasi-subgradient method for stochastic quasi-convex feasibility problems, On integral generalized policy iteration for continuous-time linear quadratic regulations, A stochastic games framework for verification and control of discrete time stochastic hybrid systems, A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning, Guiding exploration by pre-existing knowledge without modifying reward, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Stability-constrained Markov decision processes using MPC, Model-free finite-horizon optimal tracking control of discrete-time linear systems, Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach, Testing facility location and dynamic capacity planning for pandemics with demand uncertainty, Event-triggered constrained control with DHP implementation for nonaffine discrete-time systems, Complexity bounds for approximately solving discounted MDPs by value iterations, When control and state variations increase uncertainty: modeling and stochastic control in discrete time, Stochastic quasi-Newton with line-search regularisation, Efficient approximate dynamic programming based on design and analysis of computer experiments for infinite-horizon optimization, Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes, On the convergence of reinforcement learning with Monte Carlo exploring starts, On learning and branching: a survey, Subgradient averaging for multi-agent optimisation with different constraint sets, A stability criterion for two timescale stochastic approximation schemes, Time-optimal control of large-scale systems of systems using compositional optimization, Efficient algorithms of pathwise dynamic programming for decision optimization in mining operations, Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances, Numerically tractable optimistic bilevel problems, Neural circuits for learning context-dependent associations of stimuli, Meso-parametric value function approximation for dynamic customer acceptances in delivery routing, Dynamic focus programming: a new approach to sequential decision problems under uncertainty, A data-driven neural network approach to optimal asset allocation for target based defined contribution pension plans, Improved value iteration for neural-network-based stochastic optimal control design, Robust min-max optimal control design for systems with uncertain models: a neural dynamic programming approach, Asymptotic optimality and rates of convergence of quantized stationary policies in continuous-time Markov decision processes, Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems, Dynamic parameters in sequential decision making, Perturbed proximal primal-dual algorithm for nonconvex nonsmooth optimization, Optimization of a special case of continuous-time Markov decision processes with compact action set, Dynamic optimization over infinite-time horizon: web-building strategy in an orb-weaving spider as a case study, Dynamic pricing and inventory control: robust vs. stochastic uncertainty models---a computational study, Blood platelet production: optimization by dynamic programming and simulation, Neuro-dynamic trading methods, Integrated condition-based maintenance and multi-item lot-sizing with stochastic demand, Convergence analysis of the deep neural networks based globalized dual heuristic programming, New algorithms of the Q-learning type, Water reservoir control under economic, social and environmental constraints, Computational bounds for elevator control policies by large scale linear programming, Dynamic speed scaling minimizing expected energy consumption for real-time tasks, Controlled sequential Monte Carlo, A conservative index heuristic for routing problems with multiple heterogeneous service facilities, From model-based control to data-driven control: survey, classification and perspective, PageRank optimization by edge selection, Approximate dynamic programming for stochastic \(N\)-stage optimization with application to optimal consumption under uncertainty, Converging marriage in honey-bees optimization and application to stochastic dynamic programming, Application of reinforcement learning to the game of Othello, Adaptive stepsize selection for tracking in a regime-switching environment, Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint, Asynchronous Lagrangian scenario decomposition, An approximate dynamic programming approach to project scheduling with uncertain resource availabilities, Efficient sampling in approximate dynamic programming algorithms, Motion planning in uncertain environments with vision-like sensors, Dynamic multi-appointment patient scheduling for radiation therapy, Dynamic modeling and control of supply chain systems: A review, A tutorial on the cross-entropy method, Basis function adaptation in temporal difference reinforcement learning, Approximate dynamic programming-based approaches for input--output data-driven control of nonlinear processes, Convergence property of gradient-type methods with non-monotone line search in the presence of perturbations, Linear stochastic approximation driven by slowly varying Markov chains, An actor-critic algorithm for constrained Markov decision processes, Boundedness of iterates in \(Q\)-learning, Learning dynamic prices in electronic retail markets with customer segmentation, Comparing heuristics for the product allocation problem in multi-level warehouses under compatibility constraints, Dynamic programming and suboptimal control: a survey from ADP to MPC, Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states, A note on linear function approximation using random projections, An approximate dynamic programming approach for comparing firing policies in a networked air defense environment, Computational aspects of optimal strategic network diffusion, MASAGE: model-agnostic sequential and adaptive game estimation, Data-driven optimal control with a relaxed linear program, Dynamic pricing with Bayesian demand learning and reference price effect, Deep reinforcement learning for inventory control: a roadmap, Stochastic dynamic vehicle routing in the light of prescriptive analytics: a review, An aggregation-based approximate dynamic programming approach for the periodic review model with random yield, An improvement of single-network adaptive critic design for nonlinear systems with asymmetry constraints, Dynamic pricing models for electronic business, Monte Carlo methods for pricing financial options, Reinforcement learning for distributed control and multi-player games, From reinforcement learning to optimal control: a unified framework for sequential decisions, Reinforcement learning: an industrial perspective, The role of systems biology, neuroscience, and thermodynamics in network control and learning, Stochastic approximations of constrained discounted Markov decision processes, Stochastic iterative dynamic programming: a Monte Carlo approach to dual control, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning, Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning, Multi-agent discrete-time graphical games and reinforcement learning solutions, Performance optimization for a class of generalized stochastic Petri nets, Whittle index based Q-learning for restless bandits with average reward, Dynamic decision making for graphical models applied to oil exploration, Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms, Approximate policy iteration: a survey and some new methods, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications, Generalized maximum entropy estimation, Algorithms for Optimal Control of Stochastic Switching Systems, ExpertRNA: A New Framework for RNA Secondary Structure Prediction, Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization, Scalable Reinforcement Learning for Multiagent Networked Systems, Unnamed Item, Unnamed Item, Stochastic Learning Approach for Binary Optimization: Application to Bayesian Optimal Design of Experiments, Discrete-time dynamic graphical games: model-free reinforcement learning solution, Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning, Unnamed Item, From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming, Asymptotics of Reinforcement Learning with Neural Networks, Markov Reward Models and Markov Decision Processes in Discrete and Continuous Time: Performance Evaluation and Optimization, Multiple-sets split quasi-convex feasibility problems: Adaptive subgradient methods with convergence guarantee, Automated Reinforcement Learning (AutoRL): A Survey and Open Problems, Flexible FOND Planning with Explicit Fairness Assumptions, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Dynamic Stochastic Matching Under Limited Time, Unnamed Item, Unnamed Item, Unnamed Item, Ordinary Differential Equation Methods for Markov Decision Processes and Application to Kullback--Leibler Control Cost, A novel optimal tracking control scheme for a class of discrete-time nonlinear systems using generalised policy iteration adaptive dynamic programming algorithm, Mean field Markov decision processes, Stochastic switching for partially observable dynamics and optimal asset allocation, Experience replay–based output feedback Q‐learning scheme for optimal output tracking control of discrete‐time linear systems, On the Taylor Expansion of Value Functions, Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Grid-Level Energy Storage, Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems, Spare Parts Inventory Management with Substitution-Dependent Reliability, Optimal empty vehicle redistribution for hub‐and‐spoke transportation systems, Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals, Parallel Optimization Techniques for Machine Learning, On Generalized Bellman Equations and Temporal-Difference Learning, An Overview for Markov Decision Processes in Queues and Networks, Towards Min Max Generalization in Reinforcement Learning, Deep Neural Networks Algorithms for Stochastic Control Problems on Finite Horizon: Convergence Analysis, Decomposition Methods for Computing Directional Stationary Solutions of a Class of Nonsmooth Nonconvex Optimization Problems, Q-Learning for Distributionally Robust Markov Decision Processes, Adaptive Learning Algorithm Convergence in Passive and Reactive Environments, Unnamed Item, Adaptive Robust Control in Continuous Time, Robust Optimizers for Nonlinear Programming in Approximate Dynamic Programming, Unnamed Item, Control of chaotic systems by deep reinforcement learning, Reward-Modulated Hebbian Learning of Decision Making, Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis, Risk-Constrained Reinforcement Learning with Percentile Risk Criteria, Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques, Finite horizon optimal control of non-linear discrete-time switched systems using adaptive dynamic programming with ε-error bound, Optimal Dynamic Treatment Regimes, Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning, Decomposition of large-scale stochastic optimal control problems, Computable approximations for average Markov decision processes in continuous time, Stability and monotone convergence of generalised policy iteration for discrete-time linear quadratic regulations, Convergence of the standard RLS method andUDUTfactorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming, A rollout algorithm for the resource constrained elementary shortest path problem, Unnamed Item, A Spiking Neural Network Model of an Actor-Critic Learning Agent, Continuous-Time Robust Dynamic Programming, Opportunistic Transmission over Randomly Varying Channels, Simultaneous Optimal Control and Discrete Stochastic Sensor Selection, Value and Policy Function Approximations in Infinite-Horizon Optimization Problems, Hebbian Versus Gradient Training of ESN Actors in Closed-Loop ACD, Regularity and Stability of Feedback Relaxed Controls, New Rollout Algorithms for Combinatorial Optimization Problems, Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures, Bayesian Exploration for Approximate Dynamic Programming, Transient-State Natural Gas Transmission in Gunbarrel Pipeline Networks, A Block Successive Upper-Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization, Output‐feedback H quadratic tracking control of linear systems using reinforcement learning, Challenges in Enterprise Wide Optimization for the Process Industries, Projection algorithms with dynamic stepsize for constrained composite minimization, A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs, Suboptimal Policies for Stochastic $$N$$-Stage Optimization: Accuracy Analysis and a Case Study from Optimal Consumption, A sequential updating scheme of the Lagrange multiplier for separable convex programming, On the structure of value functions for threshold policies in queueing models, On the Computational Complexity of Minimum-Concave-Cost Flow in a Two-Dimensional Grid, Bounds for Multistage Stochastic Programs Using Supervised Learning Strategies, Q( $$\lambda $$ ) with Off-Policy Corrections, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, Robust shortest path planning and semicontractive dynamic programming, Optimal control of a class of nonlinear stochastic systems, Unnamed Item, Adaptive dynamic programming for model‐free tracking of trajectories with time‐varying parameters, Approximate dynamic programming methods for an inventory allocation problem under uncertainty, QUANTUM COMPUTATION FOR ACTION SELECTION USING REINFORCEMENT LEARNING, Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation, Convergence Rates and Decoupling in Linear Stochastic Approximation Algorithms, Stable Optimal Control and Semicontractive Dynamic Programming, Unnamed Item, FLOW SHOP SCHEDULING WITH REINFORCEMENT LEARNING, Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming, Unnamed Item, Unnamed Item, Finite-horizon optimal control for continuous-time uncertain nonlinear systems using reinforcement learning