Publication:4626283

From MaRDI portal


zbMath1407.68009MaRDI QIDQ4626283

Andrew G. Barto, Richard S. Sutton

Publication date: 27 February 2019



68-01: Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science

68T05: Learning and adaptive systems in artificial intelligence


Related Items

Comments on: ``Perspectives on integer programming for time-dependent models, Optimal Energy Shaping via Neural Approximators, Reinforcement-learning-based control of convectively unstable flows, Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning, Model-based Reinforcement Learning: A Survey, Universal and optimal coin sequences for high entanglement generation in 1D discrete time quantum walks, Solving Rubik’s cube via quantum mechanics and deep reinforcement learning, What are the Most Important Statistical Ideas of the Past 50 Years?, Comparative analysis of machine learning methods for active flow control, A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic, Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs, Efficient Time-Stepping for Numerical Integration Using Reinforcement Learning, Value-Gradient Based Formulation of Optimal Control Problem and Machine Learning Algorithm, Reinforcement Learning for Linear-Convex Models with Jumps via Stability Analysis of Feedback Controls, A deep multi-agent reinforcement learning approach to solve dynamic job shop scheduling problem, Learning to schedule heuristics for the simultaneous stochastic optimization of mining complexes, Employing reinforcement learning to enhance particle swarm optimization methods, Defense and security planning under resource uncertainty and multi‐period commitments, Dimension reduction based adaptive dynamic programming for optimal control of discrete-time nonlinear control-affine systems, Access control method for EV charging stations based on state aggregation and Q-learning, Toward multi-target self-organizing pursuit in a partially observable Markov game, Time series compression based on reinforcement learning, Phenotype control techniques for Boolean gene regulatory networks, Accelerating reinforcement learning with case-based model-assisted experience augmentation for process control, Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics, Learning-based state estimation and control using MHE and MPC schemes with imperfect models, Off‐policy integral reinforcement learning‐based optimal tracking control for a class of nonzero‐sum game systems with unknown dynamics, Active sensing with artificial neural networks, A Bayesian reinforcement learning approach in Markov games for computing near-optimal policies, Dynamic treatment regimes with interference, A Lyapunov characterization of robust policy optimization, Sailboat navigation control system based on spiking neural networks, Fast adaptive regression-based model predictive control, Combining variable neighborhood search and machine learning to solve the vehicle routing problem with crowd-shipping, Hypothesis testing in adaptively sampled data: ART to maximize power beyond \textit{iid }sampling, Formalization of methods for the development of autonomous artificial intelligence systems, Supervised Machine Learning Techniques: An Overview with Applications to Banking, Explore and Exploit with Heterotic Line Bundle Models, Three ways to solve partial differential equations with neural networks — A review, Hybrid analysis and modeling, eclecticism, and multifidelity computing toward digital twin revolution, Modified general policy iteration based adaptive dynamic programming for unknown discrete‐time linear systems, Heterotic String Model Building with Monad Bundles and Reinforcement Learning, optimal control for semi‐Markov jump linear systems via TP‐free temporal difference () learning, Optimal production ramp‐up in the smartphone manufacturing industry, A differential Hebbian framework for biologically-plausible motor control, Sparse polynomial optimisation for neural network verification, Adaptive optimal control of continuous-time nonlinear affine systems via hybrid iteration, Stability analysis of optimal control problems with time-dependent costs, Approximate Newton Policy Gradient Algorithms, Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization, Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework, DAPath: distance-aware knowledge graph reasoning based on deep reinforcement learning, Neighbor Q‐learning based consensus control for discrete‐time multi‐agent systems, Pair-Switching Rerandomization, A conflict-directed approach to chance-constrained mixed logical linear programming, Learning reward machines: a study in partially observable reinforcement learning, Risk filtering and risk-averse control of Markovian systems subject to model uncertainty, Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning, Placing approach-avoidance conflict within the framework of multi-objective reinforcement learning, Continuous interval type‐2 fuzzy Q‐learning algorithm for trajectory tracking tasks for vehicles, Non-iterative generation of an optimal mesh for a blade passage using deep reinforcement learning, Markov decision processes with burstiness constraints, An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents, Reward-respecting subtasks for model-based reinforcement learning, Value function estimators for Feynman-Kac forward-backward SDEs in stochastic optimal control, Finite-time convergence rates of distributed local stochastic approximation, Optimal transmission strategy for multiple Markovian fading channels: existence, structure, and approximation, Safe reinforcement learning: A control barrier function optimization approach, Output‐feedback Q‐learning for discrete‐time linear H tracking control: A Stackelberg game approach, A flow based formulation and a reinforcement learning based strategic oscillation for cross-dock door assignment, Optimal control problem of various epidemic models with uncertainty based on deep reinforcement learning, On the comparison of discounted-sum automata with multiple discount factors, Deep reinforcement learning with reference system to handle constraints for energy-efficient train control, A reinforcement learning approach to distribution-free capacity allocation for sea cargo revenue management, Cosmic Inflation and Genetic Algorithms, Block Policy Mirror Descent, Heterogeneous optimal formation control of nonlinear multi-agent systems with unknown dynamics by safe reinforcement learning, Deep reinforcement learning for adaptive mesh refinement, A unified algorithm framework for mean-variance optimization in discounted Markov decision processes, Scaling up stochastic gradient descent for non-convex optimisation, Smoothing policies and safe policy gradients, A taxonomy for similarity metrics between Markov decision processes, Deep reinforcement trading with predictable returns, Reward (Mis)design for autonomous driving, Competence-aware systems, Risk-averse optimization of reward-based coherent risk measures, Model-free finite-horizon optimal control of discrete-time two-player zero-sum games, Reinforcement learning-based optimised control for a class of second-order nonlinear dynamic systems, Empirical deep hedging, Variational actor-critic algorithms,, An impossibility result in automata-theoretic reinforcement learning, Reusable contracts for safe integration of reinforcement learning in hybrid systems, \textsc{GoSafeOpt}: scalable safe exploration for global optimization of dynamical systems, Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning, Is there a role for statistics in artificial intelligence?, Predicting rare events using neural networks and short-trajectory data, A stochastic maximum principle approach for reinforcement learning with parameterized environment, A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system, Data-driven passivity-based control of underactuated mechanical systems via interconnection and damping assignment, Adaptive importance sampling for control and inference, Using reinforcement learning to find an optimal set of features, Machine learning in agent-based stochastic simulation: inferential theory and evaluation in transportation logistics, Reinforcement learning: exploration-exploitation dilemma in multi-agent foraging task, Attack allocation on remote state estimation in multi-systems: structural results and asymptotic solution, Imitation learning of car driving skills with decision trees and random forests, Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism, Collective behavior of artificial intelligence population: transition from optimization to game, Event-based optimization approach for solving stochastic decision problems with probabilistic constraint, A convex optimization approach to dynamic programming in continuous state and action spaces, Model-free reinforcement learning for branching Markov decision processes, Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning, Multi-objective optimization of water-using systems, Real-time dynamic programming for Markov decision processes with imprecise probabilities, Perception control, A Markovian mechanism of proportional resource allocation in the incentive model as a dynamic stochastic inverse Stackelberg game, Unfazed by both the bull and bear: strategic exploration in dynamic environments, Open problems in universal induction \& intelligence, On the computability of Solomonoff induction and AIXI, Reinforcement learning for a class of continuous-time input constrained optimal control problems, Shape constraints in economics and operations research, Refinement of the four-dimensional GLV method on elliptic curves, Post-quantum static-static key agreement using multiple protocol instances, Efficient reductions in cyclotomic rings -- application to Ring LWE based FHE schemes, Reinforcement learning with via-point representation, A projected primal-dual gradient optimal control method for deep reinforcement learning, Pitfalls in quantifying exploration in reward-based motor learning and how to avoid them, Methods for improving the efficiency of swarm optimization algorithms. A survey, Qualitative case-based reasoning and learning, The concept of constructing an artificial dispatcher intelligent system based on deep reinforcement learning for the automatic control system of electric networks, Deep reinforcement learning with temporal logics, Clustering in block Markov chains, An online-learning-based evolutionary many-objective algorithm, Learning output reference model tracking for higher-order nonlinear systems with unknown dynamics, Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards, Adaptive learning in large populations, Models and measures of animal aggregation and dispersal, Bayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacks, A linear programming methodology for approximate dynamic programming, Machine learning for combinatorial optimization: a methodological tour d'horizon, Deep hedging of long-term financial derivatives, On satisficing in quantitative games, Markov decision processes with dynamic transition probabilities: an analysis of shooting strategies in basketball, On the finite horizon optimal switching problem with random lag, Negotiating team formation using deep reinforcement learning, The voice of optimization, Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling, Importance sampling in reinforcement learning with an estimated behavior policy, Boltzmann distributed replicator dynamics: population games in a microgrid context, Accelerating reinforcement learning with a directional-Gaussian-smoothing evolution strategy, A reinforcement learning approach for dynamic multi-objective optimization, Neural precedence recommender, Enhancing gene expression programming based on space partition and jump for symbolic regression, Improving pest monitoring networks using a simulation-based approach to contribute to pesticide reduction, Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems, Fully asynchronous policy evaluation in distributed reinforcement learning over networks, A deep learning model for gas storage optimization, A survey of learning-based control of robotic visual servoing systems, Anticipative dynamic slotting for attended home deliveries, Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation, Superquantiles at work: machine learning applications and efficient subgradient computation, A solution to the path planning problem via algebraic geometry and reinforcement learning, Variational learning from implicit bandit feedback, Challenges of real-world reinforcement learning: definitions, benchmarks and analysis, Grounded action transformation for sim-to-real reinforcement learning, Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems, Partially observable environment estimation with uplift inference for reinforcement learning based recommendation, Model-free LQR design by Q-function learning, Reinforcement learning and stochastic optimisation, Interpretable machine learning: fundamental principles and 10 grand challenges, Deep reinforcement learning for inventory control: a roadmap, Deep Q-learning for same-day delivery with vehicles and drones, Learning to select operators in meta-heuristics: an integration of Q-learning into the iterated greedy algorithm for the permutation flowshop scheduling problem, Inverse reinforcement learning for multi-player noncooperative apprentice games, Risk-averse policy optimization via risk-neutral policy optimization, Adaptive output regulation for cyber-physical systems under time-delay attacks, FBSDE based neural network algorithms for high-dimensional quasilinear parabolic PDEs, A PAC algorithm in relative precision for bandit problem with costly sampling, Deep reinforcement learning for \textsf{FlipIt} security game, A taxonomy of surprise definitions, Reinforcement learning for the knapsack problem, Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization, Does lifelong learning affect mobile robot evolution?, A reinforcement learning model to inform optimal decision paths for HIV elimination, A reinforcement learning algorithm for rescheduling preempted tasks in fog nodes, Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning, Simplified risk-aware decision making with belief-dependent rewards in partially observable domains, What may lie ahead in reinforcement learning, Reinforcement learning for distributed control and multi-player games, From reinforcement learning to optimal control: a unified framework for sequential decisions, Fundamental design principles for reinforcement learning algorithms, Mixed density methods for approximate dynamic programming, Adaptive dynamic programming in the Hamiltonian-driven framework, Optimal adaptive control of partially uncertain linear continuous-time systems with state delay, Dissipativity-based verification for autonomous systems in adversarial environments, Multi-agent reinforcement learning: a selective overview of theories and algorithms, A top-down approach to attain decentralized multi-agents, Bounded rationality in learning, perception, decision-making, and stochastic games, Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning, Reinforcement learning: an industrial perspective, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Cooperative receding horizon strategies for the multivehicle routing problem, Ordinary Differential Equation Methods for Markov Decision Processes and Application to Kullback--Leibler Control Cost, A Q-Learning Approach for Investment Decisions, A reinforcement learning approach to personalized learning recommendation systems, Accelerating Stochastic Composition Optimization, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Gradient-Free Methods with Inexact Oracle for Convex-Concave Stochastic Saddle-Point Problem, Active cloaking in Stokes flows via reinforcement learning, MultiLevel Composite Stochastic Optimization via Nested Variance Reduction, Statistical Inference for Online Decision Making via Stochastic Gradient Descent, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation, Adaptive dynamic programming for model‐free tracking of trajectories with time‐varying parameters, Deeply Felt Affect: The Emergence of Valence in Deep Active Inference, Enhanced Equivalence Projective Simulation: A Framework for Modeling Formation of Stimulus Equivalence Classes, Active Inference: Demystified and Compared, Sophisticated Inference, Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients, Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation, Continuous-domain ant colony optimization algorithm based on reinforcement learning, Applying Deep Reinforcement Learning in Automated Stock Trading, Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise, Allocating resources via price management systems: a dynamic programming-based approach, Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis, Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning, A General Theory of MultiArmed Bandit Processes with Constrained Arm Switches, Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks, Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method, On Gradient-Based Learning in Continuous Games, Finite-horizon optimal control for continuous-time uncertain nonlinear systems using reinforcement learning, Completion of the Infeasible Actions of Others: Goal Inference by Dynamical Invariant, Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms, Scenario-Based Verification of Uncertain MDPs, Bellman's principle of optimality and deep reinforcement learning for time-varying tasks, Exploratory HJB Equations and Their Convergence, Prolog Technology Reinforcement Learning Prover, Bayesian Inference of Hidden Markov Models Using Dirichlet Mixtures, Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints, Dynamic portfolio choice: a simulation-and-regression approach, On learning and branching: a survey, Good arm identification via bandit feedback, Complexity bounds for approximately solving discounted MDPs by value iterations, Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods, Max-plus approximation for reinforcement learning, On the convergence of reinforcement learning with Monte Carlo exploring starts, A learning based algorithm for drone routing, A numerical study of Markov decision process algorithms for multi-component replacement problems, Multi-condition multi-objective optimization using deep reinforcement learning, Improve generated adversarial imitation learning with reward variance regularization, On the effect of probing noise in optimal control LQR via Q-learning using adaptive filtering algorithms, Scalable uncertainty quantification for deep operator networks using randomized priors, Logarithmic regret in online linear quadratic control using Riccati updates, A Lyapunov approach for stable reinforcement learning, Intelligent inventory management approaches for perishable pharmaceutical products in a healthcare supply chain, Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization, Value functions for depth-limited solving in zero-sum imperfect-information games, Linear quadratic tracking control of unknown systems: a two-phase reinforcement learning method, Optimal control by deep learning techniques and its applications on epidemic models, GPI-based design for partially unknown nonlinear two-player zero-sum games, Preparation of three-atom GHZ states based on deep reinforcement learning, Concentration inequality for U-statistics of order two for uniformly ergodic Markov chains, Optimised graded metamaterials for mechanical energy confinement and amplification via reinforcement learning, Combinatorial optimization. Abstracts from the workshop held November 7--13, 2021 (hybrid meeting), Quantum greedy algorithms for multi-armed bandits, Multi-agent reinforcement learning for decentralized stable matching, Memory-two strategies forming symmetric mutual reinforcement learning equilibrium in repeated prisoners' dilemma game, Multi-fidelity reinforcement learning framework for shape optimization, Reinforcement learning for exploratory linear-quadratic two-person zero-sum stochastic differential games, Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future, Artificial Intelligence Algorithms in Behavioural Control of Wheeled Mobile Robots Formation, Unnamed Item, Robust shortest path planning and semicontractive dynamic programming, Optimal Path Planning for Information Based Localization, Semantic Labelling and Learning for Parity Game Solving in LTL Synthesis, Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals, Optimal Learning with Local Nonlinear Parametric Models over Continuous Designs, On Generalized Bellman Equations and Temporal-Difference Learning, Active Localization of Multiple Targets from Noisy Relative Measurements, Regularity and Stability of Feedback Relaxed Controls, Reverse-Engineering Neural Networks to Characterize Their Cost Functions, Closed-Loop Deep Learning: Generating Forward Models With Backpropagation, Active Learning for Level Set Estimation Under Input Uncertainty and Its Extensions, A hybrid dynamical systems perspective on reinforcement learning for cyber-physical systems: vistas, open problems, and challenges, The role of systems biology, neuroscience, and thermodynamics in network control and learning, Quantum amplitude amplification for reinforcement learning, Adapting multi-agent swarm robotics to achieve synchronised behaviour from production line automata, Distributed consensus control for nonlinear multi-agent systems, Optimal output regulation for unknown continuous-time linear systems by internal model and adaptive dynamic programming, Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning, Reinforcement learning for multi-item retrieval in the puzzle-based storage system, Tight and loose coupling in organizations, Closing the gap: combining task specification and reinforcement learning for compliant vegetable cutting, Learning-based vs model-free adaptive control of a MAV under wind gust, The pure exploration problem with general reward functions depending on full distributions, Policies for the dynamic traveling maintainer problem with alerts, Guiding an automated theorem prover with neural rewriting, Controlled interacting particle algorithms for simulation-based reinforcement learning, Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems, Opportunities for reinforcement learning in stochastic dynamic vehicle routing, A new deep neural network algorithm for multiple stopping with applications in options pricing, Learning that grid-convenience does not hurt resilience in the presence of uncertainty, Batch policy learning in average reward Markov decision processes, Multi-objective dynamic programming with limited precision, Order scoring, bandit learning and order cancellations, Deep learning classification: modeling discrete labor choice, Whittle index based Q-learning for restless bandits with average reward, SeaPearl: a constraint programming solver guided by reinforcement learning, Improving branch-and-bound using decision diagrams and reinforcement learning, A tutorial on optimal control and reinforcement learning methods for quantum technologies, Predictive market making via machine learning, A derivative-free method for solving elliptic partial differential equations with deep neural networks, SAMBA: safe model-based \& active reinforcement learning, Reinforcement learning for robotic manipulation using simulated locomotion demonstrations, A new algorithm for the LQR problem with partially unknown dynamics, A noise-immune reinforcement learning method for early diagnosis of neuropsychiatric systemic lupus erythematosus, Deep reinforcement learning for the control of conjugate heat transfer, Mean-field Markov decision processes with common noise and open-loop controls, Sequencing of multi-robot behaviors using reinforcement learning, A new dissipativity condition for asymptotic stability of discounted economic MPC, Learning-driven feasible and infeasible tabu search for airport gate assignment, Reinforcement learning explains various conditional cooperation, Towards finding longer proofs, The role of entropy in guiding a connection prover, On motion camouflage as proportional navigation, Bandit and covariate processes, with finite or non-denumerable set of arms, Adaptive large neighborhood search for mixed integer programming, Self-adapting WIP parameter setting using deep reinforcement learning, A sojourn-based approach to semi-Markov reinforcement learning, Resilient reinforcement learning and robust output regulation under denial-of-service attacks, Unified reinforcement Q-learning for mean field game and control problems, Reinforcement learning-based design of side-channel countermeasures, Solving elliptic equations with Brownian motion: bias reduction and temporal difference learning, Generalisations of a Bayesian decision-theoretic randomisation procedure and the impact of delayed responses, Stability-constrained Markov decision processes using MPC, Self-triggered control of probabilistic Boolean control networks: a reinforcement learning approach, Model-free finite-horizon optimal tracking control of discrete-time linear systems, Deep reinforcement learning of viscous incompressible flow, Zeroth-order methods for noisy Hölder-gradient functions, Lipschitzness is all you need to tame off-policy generative adversarial imitation learning, Detect, understand, act: a neuro-symbolic hierarchical reinforcement learning framework, Semi-Lipschitz functions and machine learning for discrete dynamical systems on graphs, Policy space identification in configurable environments, Planning for potential: efficient safe reinforcement learning, Privacy-preserving estimation of an optimal individualized treatment rule: a case study in maximizing time to severe depression-related outcomes, Human motor learning is robust to control-dependent noise, Reinforcement learning with algorithms from probabilistic structure estimation, Probabilistic programming with stochastic variational message passing, Stochastic optimization for vaccine and testing kit allocation for the COVID-19 pandemic, Deep reinforcement learning for wireless sensor scheduling in cyber-physical systems, The multi-armed bandit problem: an efficient nonparametric solution, A hierarchical Bayesian approach to assess learning and guessing strategies in reinforcement learning, A systematic literature review on machine learning applications for sustainable agriculture supply chain performance, Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems, Rescorla-Wagner models with sparse dynamic attention, Dissecting EXIT, Symplectic Runge-Kutta discretization of a regularized forward-backward sweep iteration for optimal control problems, Convergence analysis of the deep neural networks based globalized dual heuristic programming, A computational model for spatial cognition combining dorsal and ventral hippocampal place field maps: multiscale navigation, A conservative index heuristic for routing problems with multiple heterogeneous service facilities, Convergence results for an averaged LQR problem with applications to reinforcement learning, Reward is enough, Deep reinforcement learning for the optimal placement of cryptocurrency limit orders, Enhance load forecastability: optimize data sampling policy by reinforcing user behaviors, Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint, Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner's dilemma, A review on deep reinforcement learning for fluid mechanics, Cooperative and non-cooperative behaviour in the exploitation of a common renewable resource with environmental stochasticity, Output regulation of unknown linear systems using average cost reinforcement learning, General solutions for nonlinear differential equations: a rule-based self-learning approach using deep reinforcement learning, Control strategy of speed servo systems based on deep reinforcement learning, Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach, Dynamic selective maintenance optimization for multi-state systems over a finite horizon: a deep reinforcement learning approach, A reinforcement learning scheme for the equilibrium of the in-vehicle route choice problem based on congestion game, A correctness result for synthesizing plans with loops in stochastic domains, The Hanabi challenge: a new frontier for AI research, Deciding probabilistic bisimilarity distance one for probabilistic automata, Branes with brains: exploring string vacua with deep reinforcement learning, Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies, Adaptive cruise control via adaptive dynamic programming with experience replay, TD-regularized actor-critic methods, Non-equilibrium dynamic games and cyber-physical security: a cognitive hierarchy approach, Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design, Safety-constrained reinforcement learning with a distributional safety critic, A reinforced hybrid genetic algorithm for the traveling salesman problem, A DRL based approach for adaptive scheduling of one-of-a-kind production, Optimizing high-dimensional stochastic forestry \textit{via} reinforcement learning, A Discrete-Time Switching System Analysis of Q-Learning, Spatial state-action features for general games, Stochastic composition optimization of functions without Lipschitz continuous gradient, Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach, Dynamic demand management and online tour planning for same-day delivery, Simultaneous perception-action design via invariant finite belief sets, Learning to Optimize, The landscape of the proximal point method for nonconvex-nonconcave minimax optimization, A gradient-based reinforcement learning model of market equilibration, A review of the operations literature on real options in energy, A general deep reinforcement learning hyperheuristic framework for solving combinatorial optimization problems, Near-grazing bifurcations and deep reinforcement learning control of an impact oscillator with elastic constraints, A reinforcement learning approach to the stochastic cutting stock problem, Undiscounted reinforcement learning for infinite-time optimal output tracking and disturbance rejection of discrete-time LTI systems with unknown dynamics, Off‐policy model‐based end‐to‐end safe reinforcement learning, A mathematical perspective of machine learning, Dissipativity in infinite horizon optimal control and dynamic programming, Identity concealment games: how I learned to stop revealing and love the coincidences, A digital twin framework for civil engineering structures, Poster Abstract: Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives, First Steps Towards a Runtime Analysis of Neuroevolution, A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning, Personalized dynamic treatment regimes in continuous time: a Bayesian approach for optimizing clinical decisions with timing, Evolution of semi-Kantian preferences in two-player assortative interactions with complete and incomplete information and plasticity, Optimal consensus model-free control for multi-agent systems subject to input delays and switching topologies, Reduced modelling and optimal control of epidemiological individual‐based models with contact heterogeneity, Automated Deep Learning: Neural Architecture Search Is Not the End, Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge, Empirical underidentification in estimating random utility models: The role of choice sets and standardizations, Generative methods for sampling transition paths in molecular dynamics, gym-flp: a Python package for training reinforcement learning algorithms on facility layout problems, Region-based approximation in approximate dynamic programming, Control policy learning design for vehicle urban positioning via BeiDou navigation, Optimal Treatment Regimes: A Review and Empirical Comparison, Inverse optimal control for averaged cost per stage linear quadratic regulators, Online decision making for trading wind energy, On the sample complexity of actor-critic method for reinforcement learning with function approximation, Inverse reinforcement learning through logic constraint inference, Continuous Positional Payoffs, Discovering agents, Minimum input design for direct data-driven property identification of unknown linear systems, Reward Maximization Through Discrete Active Inference, Deep Q‐learning: A robust control approach, A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets, Minimum information divergence of Q-functions for dynamic treatment resumes, Geometry and convergence of natural policy gradient methods, Multi-agent machine learning in self-organizing systems, Adaptive operator selection with reinforcement learning, Tutorial on Amortized Optimization, Reinforcement Learning, Bit by Bit, A self‐adaptive SAC‐PID control approach based on reinforcement learning for mobile robots, Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems, Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning, A note on generalized second-order value iteration in Markov decision processes, Gradient temporal-difference learning for off-policy evaluation using emphatic weightings, Recent advances in reinforcement learning in finance, Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning, An evolutionary estimation procedure for generalized semilinear regression trees, A policy-based learning beam search for combinatorial optimization, Quantum optimal control: practical aspects and diverse methods, Reinforcement learning‐based robust optimal output regulation for constrained nonlinear systems with static and dynamic uncertainties, Cost-aware defense for parallel server systems against reliability and security failures, Incremental reinforcement learning and optimal output regulation under unmeasurable disturbances, Pessimistic value iteration for multi-task data sharing in offline reinforcement learning, Exploratory machine learning with unknown unknowns, A Semiparametric Inverse Reinforcement Learning Approach to Characterize Decision Making for Mental Disorders, Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons, Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection, Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process, Modeling and Active Learning for Experiments with Quantitative-Sequence Factors, Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning, Optimal sensor scheduling for remote state estimation with limited bandwidth: a deep reinforcement learning approach, Optimal deterministic controller synthesis from steady-state distributions, A general motion control framework for an autonomous underwater vehicle through deep reinforcement learning and disturbance observers, Data-driven optimal control via linear transfer operators: a convex approach, AI-driven liquidity provision in OTC financial markets, Design of experiments for the calibration of history-dependent models via deep reinforcement learning and an enhanced Kalman filter, Multi-agent natural actor-critic reinforcement learning algorithms, Optimal transmission scheduling for remote state estimation in CPSs with energy harvesting two-hop relay networks, Deep reinforcement learning control approach to mitigating actuator attacks, Alternating good-for-MDPs automata, Learning key steps to attack deep reinforcement learning agents, Safe multi-agent reinforcement learning for multi-robot control, Contractivity of Bellman operator in risk averse dynamic programming with infinite horizon, Specification-guided reinforcement learning, Multi-agent reinforcement learning aided sampling algorithms for a class of multiscale inverse problems, Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems, Portfolio selection with exploration of new investment assets, An online reinforcement learning approach to charging and order-dispatching optimization for an e-hailing electric vehicle fleet, Adaptive cut selection in mixed-integer linear programming, Learning through imitation by using formal verification, State-flipped control and Q-learning for finite horizon output tracking of Boolean control networks, A novel optimization perspective to the problem of designing sequences of tasks in a reinforcement learning framework, Approximated multi-agent fitted Q iteration, Premium control with reinforcement learning, The octatope abstract domain for verification of neural networks, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Temperature transitions and degeneracy in the control of small clusters with a macroscopic field, Improving Variable Orderings of Approximate Decision Diagrams Using Reinforcement Learning, Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, Closed-form Approximations in Multi-asset Market Making, Unnamed Item, Unnamed Item, Unnamed Item, How humans learn and represent networks, Trading Signals in VIX Futures, SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits, Scalable Online Planning for Multi-Agent MDPs, Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning, State-Dependent Temperature Control for Langevin Diffusions, Simple and Optimal Methods for Stochastic Variational Inequalities, II: Markovian Noise and Policy Evaluation in Reinforcement Learning, Constrained, Global Optimization of Unknown Functions with Lipschitz Continuous Gradients, Queueing Network Controls via Deep Reinforcement Learning, Optimizing low-Reynolds-number predation via optimal control and reinforcement learning, Visual transfer for reinforcement learning via gradient penalty based Wasserstein domain confusion, Deep differentiable reinforcement learning and optimal trading, The reinforcement learning Kelly strategy, A Q-Learning Algorithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization, Automated Reinforcement Learning (AutoRL): A Survey and Open Problems, A Comprehensive Framework for Learning Declarative Action Models, Dynamic Learning and Decision Making via Basis Weight Vectors, Self-improving Q-learning based controller for a class of dynamical processes, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Online Mixed-Integer Optimization in Milliseconds, Bridging Commonsense Reasoning and Probabilistic Planning via a Probabilistic Action Language, Optimal control on graphs: existence, uniqueness, and long-term behavior, Reinforcement learning for suppression of collective activity in oscillatory ensembles, Transition scale-spaces: A computational theory for the discretized entorhinal cortex, Model-Free Robust Optimal Feedback Mechanisms of Biological Motor Control, Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies, Managing caching strategies for stream reasoning with reinforcement learning, Algorithms for recursive delegation, Deep Reinforcement Learning: A State-of-the-Art Walkthrough, Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets, Stochastic Conditional Gradient++: (Non)Convex Minimization and Continuous Submodular Maximization, Deep Neural Networks Algorithms for Stochastic Control Problems on Finite Horizon: Convergence Analysis, Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme, Estimating Scale-Invariant Future in Continuous Time, Evaluating Strategic Structures in Multi-Agent Inverse Reinforcement Learning, Muscle Synergy–Driven Robust Motion Control, A Reinforcement Learning Neural Network for Robotic Manipulator Control, Active Inference, Belief Propagation, and the Bethe Approximation, Hexagonal Grid Fields Optimally Encode Transitions in Spatiotemporal Sequences, Adaptive Learning Algorithm Convergence in Passive and Reactive Environments, Using deep learning for digitally controlled STIRAP, Control of chaotic systems by deep reinforcement learning, Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis, Continuous-Time Robust Dynamic Programming, Reinforcement Learning in Spiking Neural Networks with Stochastic and Deterministic Synapses, Deep Reinforcement Learning for Market Making in Corporate Bonds: Beating the Curse of Dimensionality, Adaptive Low-Nonnegative-Rank Approximation for State Aggregation of Markov Chains, Hybrid online learning control in networked multiagent systems: A survey, Open‐loop Stackelberg learning solution for hierarchical control problems, Output‐feedback H quadratic tracking control of linear systems using reinforcement learning, Intelligent Human–Robot Interaction Systems Using Reinforcement Learning and Neural Networks, Model-Free Reinforcement Learning for Stochastic Parity Games, Optimizing Execution Cost Using Stochastic Control, On the Complexity of Value Iteration, The QLBS Q-Learner goes NuQLear: fitted Q iteration, inverse RL, and option portfolios, Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration, Spike-Timing-Dependent Construction, Toward Nonlinear Local Reinforcement Learning Rules Through Neuroevolution, Dopamine Ramps Are a Consequence of Reward Prediction Errors, Conditional Density Estimation with Dimensionality Reduction via Squared-Loss Conditional Entropy Minimization, A Reward-Maximizing Spiking Neuron as a Bounded Rational Decision Maker, An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions, Optimal Curiosity-Driven Modular Incremental Slow Feature Analysis, Per-Round Knapsack-Constrained Linear Submodular Bandits, Online Reinforcement Learning Using a Probability Density Estimation, Neural Circuits Trained with Standard Reinforcement Learning Can Accumulate Probabilistic Information during Decision Making, LQG Online Learning, Nonconvex Policy Search Using Variational Inequalities, Dopamine, Inference, and Uncertainty, Risk-Sensitive Reinforcement Learning, On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes, Uncertainty in learning, choice, and visual fixation, Induction and Exploitation of Subgoal Automata for Reinforcement Learning, Learning When-to-Treat Policies, Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health, Coordination problems on networks revisited: statics and dynamics, Reject inference methods in credit scoring, An Introduction to Learning Automata and Optimization, Learning Automaton and Its Variants for Optimization: A Bibliometric Analysis, Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis, Optimistic planning for control of hybrid-input nonlinear systems, DDQN-based optimal targeted therapy with reversible inhibitors to combat the Warburg effect, Q-learning with heterogeneous update strategy, Optimal Scheduling of Entropy Regularizer for Continuous-Time Linear-Quadratic Reinforcement Learning, Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds, Discovering efficient periodic behaviors in mechanical systems via neural approximators, Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective, Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning, Event-triggered optimal control for discrete-time multi-player non-zero-sum games using parallel control, Learning Robust Marking Policies for Adaptive Mesh Refinement, Off-policy evaluation for tabular reinforcement learning with synthetic trajectories, Settling the sample complexity of model-based offline reinforcement learning, Distributed web hacking by adaptive consensus-based reinforcement learning, Evolving interpretable decision trees for reinforcement learning, The buffered optimization methods for online transfer function identification employed on DEAP actuator, Underestimation estimators to Q-learning, How do people build up visual memory representations from sensory evidence? Revisiting two classic models of choice, Reinforcement learning with dynamic convex risk measures, Finite‐horizon H∞ tracking control for discrete‐time linear systems, Primal-Dual Regression Approach for Markov Decision Processes with General State and Action Spaces, Trajectory modeling via random utility inverse reinforcement learning, Verifiably safe exploration for end-to-end reinforcement learning, Accelerating Primal-Dual Methods for Regularized Markov Decision Processes, Hierarchical method for cooperative multiagent reinforcement learning in Markov decision processes, The synchronized ambient calculus, Error controlled actor-critic, Consolidation of structure of high noise data by a new noise index and reinforcement learning