\({\mathcal Q}\)-learning

From MaRDI portal
Publication:1812931

DOI10.1007/BF00992698zbMath0773.68062WikidataQ57424214 ScholiaQ57424214MaRDI QIDQ1812931

Peter Dayan, Christopher J. C. H. Watkins

Publication date: 11 August 1992

Published in: Machine Learning (Search for Journal in Brave)




Related Items

A time aggregation approach to Markov decision processes, D-learning to estimate optimal individual treatment rules, Tree-based reinforcement learning for estimating optimal dynamic treatment regimes, Safe learning for near-optimal scheduling, Model-free reinforcement learning for branching Markov decision processes, Output feedback Q-learning for discrete-time linear zero-sum games with application to the \(H_\infty\) control, Imitation guided learning in learning classifier systems, A learning classifier system for mazes with aliasing clones, Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function, Bounded rationality and search over small-world models, A Markovian mechanism of proportional resource allocation in the incentive model as a dynamic stochastic inverse Stackelberg game, Q-learning agents in a Cournot oligopoly model, Multiscale Q-learning with linear function approximation, How hierarchical models improve point estimates of model parameters at the individual level, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Autonomous agents modelling other agents: a comprehensive survey and open problems, Intelligent multiple search strategy cuckoo algorithm for numerical and engineering optimization problems, Probabilistic inference for determining options in reinforcement learning, Perspectives of approximate dynamic programming, High-dimensional \(A\)-learning for optimal dynamic treatment regimes, Bifurcation mechanism design -- from optimal flat taxes to better cancer treatments, High-dimensional inference for personalized treatment decision, Linear least-squares algorithms for temporal difference learning, Feature-based methods for large scale dynamic programming, The loss from imperfect value functions in exceptation-based and minimax-based tasks, The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms, Active inference and agency: optimal control without cost functions, Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning, Machine learning in agent-based stochastic simulation: inferential theory and evaluation in transportation logistics, Open problems in universal induction \& intelligence, Designing time difference learning for interference management in heterogeneous networks, Multi-objective optimization of water-using systems, Cycle frequency in standard rock-paper-scissors games: evidence from experimental economics, Learning to compose fuzzy behaviors for autonomous agents, An adaptive learning model with foregone payoff information, Testing probabilistic equivalence through reinforcement learning, Model-based average reward reinforcement learning, Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment, Model-based estimation of subjective values using choice tasks with probabilistic feedback, Energy management for stationary electric energy storage systems: a systematic literature review, A boundedness result for the direct heuristic dynamic programming, Designing decentralized controllers for distributed-air-jet MEMS-based micromanipulators by reinforcement learning, A human-robot collaborative reinforcement learning algorithm, Distributed reinforcement learning for coordinate multi-robot foraging, Automatic generation of fuzzy inference systems via unsupervised learning, Error bounds for constant step-size \(Q\)-learning, Sensor-based learning for practical planning of fine motions in robotics, Dynamic treatment regimes: technical challenges and applications, The relation between reinforcement learning parameters and the influence of reinforcement history on choice behavior, Permissive planning: Extending classical planning to uncertain task domains., Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning, The optimal unbiased value estimator and its relation to LSTD, TD and MC, Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Offline reinforcement learning with task hierarchies, Reinforcement learning algorithms with function approximation: recent advances and applications, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains, Lyapunov stability-based control and identification of nonlinear dynamical systems using adaptive dynamic programming, Q-learning with censored data, Improving reinforcement learning by using sequence trees, Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, Decentralized reinforcement learning robust optimal tracking control for time varying constrained reconfigurable modular robot based on ACI and \(Q\)-function, Approximate policy iteration for dynamic resource-constrained project scheduling, Linear programming formulation for non-stationary, finite-horizon Markov decision process models, Variable selection for estimating the optimal treatment regimes in the presence of a large number of covariates, Shape constraints in economics and operations research, Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison, Risk-sensitive reinforcement learning algorithms with generalized average criterion, Multi-sensor transmission power control for remote estimation through a SINR-based communication channel, A projected primal-dual gradient optimal control method for deep reinforcement learning, Learning agents in an artificial power exchange: Tacit collusion, market power and efficiency of two double-auction mechanisms, Q-learning algorithms with random truncation bounds and applications to effective parallel computing, Reinforcement learning: exploration-exploitation dilemma in multi-agent foraging task, Data-based analysis of discrete-time linear systems in noisy environment: controllability and observability, Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance, Learning competitive pricing strategies by multi-agent reinforcement learning, Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach, Qualitative case-based reasoning and learning, Decentralized reinforcement learning of robot behaviors, Reinforcement learning endowed with safe veto policies to learn the control of linked-multicomponent robotic systems, Learning to bid in sequential Dutch auctions, Deep reinforcement learning with temporal logics, Integral equations and machine learning, Four encounters with system identification, Integral \(Q\)-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems, A behavioral learning process in games, Sampled fictitious play for approximate dynamic programming, Basic ideas for event-based optimization of Markov systems, Real-time reinforcement learning by sequential actor-critics and experience replay, Algebraic results and bottom-up algorithm for policies generalization in reinforcement learning using concept lattices, Sequential Advantage Selection for Optimal Treatment Regimes, Transfer of learning by composing solutions of elemental sequential tasks, If multi-agent learning is the answer, what is the question?, Free energy, value, and attractors, Reinforcement distribution in fuzzy Q-learning, Collective behavior of artificial intelligence population: transition from optimization to game, A theoretical analysis of temporal difference learning in the iterated prisoner's dilemma game, Stochastic dynamic programming with factored representations, Learning fuzzy classifier systems for multi-agent coordination, \(Q\)- and \(A\)-learning methods for estimating optimal dynamic treatment regimes, Deep advantage learning for optimal dynamic treatment regime, Estimation of the optimal regime in treatment of prostate cancer recurrence from observational data using flexible weighting models, Cooperation between independent market makers, Improving Variable Orderings of Approximate Decision Diagrams Using Reinforcement Learning, Nonasymptotic Analysis of Monte Carlo Tree Search, The Concept of Opposition and Its Use in Q-Learning and Q(λ) Techniques, Opposite Actions in Reinforced Image Segmentation, Unnamed Item, Optimal control of aging in complex networks, Estimation for optimal treatment regimes with survival data under semiparametric model, Asymptotics of Reinforcement Learning with Neural Networks, Deep differentiable reinforcement learning and optimal trading, Double Deep Q-Learning for Optimal Execution, Fictitious Play in Zero-Sum Stochastic Games, Automated Reinforcement Learning (AutoRL): A Survey and Open Problems, Self-improving Q-learning based controller for a class of dynamical processes, Model-Free control performance improvement using virtual reference feedback tuning and reinforcement Q-learning, Adaptive contrast weighted learning for multi‐stage multi‐treatment decision‐making, Efficient Time-Stepping for Numerical Integration Using Reinforcement Learning, Off-line approximate dynamic programming for the vehicle routing problem with a highly variable customer basis and stochastic demands, Gain parameters optimization strategy of cross-coupled controller based on deep reinforcement learning, Dimension reduction based adaptive dynamic programming for optimal control of discrete-time nonlinear control-affine systems, Constrained plasticity reserve as a natural way to control frequency and weights in spiking neural networks, On estimating optimal regime for treatment initiation time based on restricted mean residual lifetime, The Boltzmann distribution in the problem of rational choice by population of a patch under an imperfect information about its resources, Combining variable neighborhood search and machine learning to solve the vehicle routing problem with crowd-shipping, A perspective on machine learning methods in turbulence modeling, Fairness-Oriented Learning for Optimal Individualized Treatment Rules, Neighbor Q‐learning based consensus control for discrete‐time multi‐agent systems, Path planning of mobile robot in unknown dynamic continuous environment using reward‐modified deep <scp>Q</scp>‐network, A General Framework for Subgroup Detection via One-Step Value Difference Estimation, Estimating Tree-Based Dynamic Treatment Regimes Using Observational Data with Restricted Treatment Sequences, Learning reward machines: a study in partially observable reinforcement learning, Continuous interval type‐2 fuzzy Q‐learning algorithm for trajectory tracking tasks for vehicles, A flow based formulation and a reinforcement learning based strategic oscillation for cross-dock door assignment, Design of reduced-order and pinning controllers for probabilistic Boolean networks using reinforcement learning, Deep reinforcement trading with predictable returns, Model-free finite-horizon optimal control of discrete-time two-player zero-sum games, Empirical deep hedging, A stochastic maximum principle approach for reinforcement learning with parameterized environment, Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time, A DRL based approach for adaptive scheduling of one-of-a-kind production, Model-Assisted Uniformly Honest Inference for Optimal Treatment Regimes in High Dimension, A Discrete-Time Switching System Analysis of Q-Learning, Collaborative optimization of last-train timetables for metro network to increase service time for passengers, Refutation of spectral graph theory conjectures with Monte Carlo search, A framework for transforming specifications in reinforcement learning, Near-grazing bifurcations and deep reinforcement learning control of an impact oscillator with elastic constraints, Poster Abstract: Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives, A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning, Unnamed Item, Unnamed Item, A Sparse Random Projection-Based Test for Overall Qualitative Treatment Effects, Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes, Experience replay–based output feedback Q‐learning scheme for optimal output tracking control of discrete‐time linear systems, Toward Nonlinear Local Reinforcement Learning Rules Through Neuroevolution, Technical Note—Consistency Analysis of Sequential Learning Under Approximate Bayesian Inference, Multi-Armed Angle-Based Direct Learning for Estimating Optimal Individualized Treatment Rules With Various Outcomes, Two-phase selective decentralization to improve reinforcement learning systems with MDP, Why the ‘selfish’ optimizing agents could solve the decentralized reinforcement learning problems, Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme, Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints, Unnamed Item, Quadratic approximate dynamic programming for input‐affine systems, Control of chaotic systems by deep reinforcement learning, A strategy for controlling nonlinear systems using a learning automaton, Continuous Action Generation of Q‐Learning in Multi‐Agent Cooperation, Rationality of reward sharing in multi-agent reinforcement learning, A tutorial survey of reinforcement learning, The actor-critic algorithm as multi-time-scale stochastic approximation., Stochastic approximation algorithms: overview and recent trends., Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures, Bayesian Exploration for Approximate Dynamic Programming, Active cloaking in Stokes flows via reinforcement learning, A set‐based model‐free reinforcement learning design technique for nonlinear systems, Model-Free Reinforcement Learning for Stochastic Parity Games, Set‐valued dynamic treatment regimes for competing outcomes, Cooperative learning with joint state value approximation for multi-agent systems, Estimation of Individualized Decision Rules Based on an Optimized Covariate-Dependent Equivalent of Random Outcomes, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Adaptive dynamic programming for model‐free tracking of trajectories with time‐varying parameters, Active Inference: Demystified and Compared, Unnamed Item, Empirical Q-Value Iteration, Learning When-to-Treat Policies, Robust Q-Learning, Coordination problems on networks revisited: statics and dynamics, Global and Local Environment State Information as Neural Network Input by Solving the Battleship Game, Navigation of micro-swimmers in steady flow: the importance of symmetries, Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis, Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis, Concentration of Contractive Stochastic Approximation and Reinforcement Learning, Model-based Reinforcement Learning: A Survey, Universal and optimal coin sequences for high entanglement generation in 1D discrete time quantum walks, Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method, Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications, Reinforcement learning for robotic manipulation using simulated locomotion demonstrations, Unbounded dynamic programming via the Q-transform, An information-theoretic analysis of return maximization in reinforcement learning, The application of temporal difference learning in optimal diet models, A two-layer networked learning control system using actor-critic neural network, Integration, participation and optimal control in water resources planning and management, A sojourn-based approach to semi-Markov reinforcement learning, A general criterion and an algorithmic framework for learning in multi-agent systems, Infinite lattice learner: an ensemble for incremental learning, Augmented direct learning for conditional average treatment effect estimation with double robustness, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Learning Intelligent Controls in High Speed Networks: Synergies of Computational Intelligence with Control and Q-Learning Theories, Quantile-Optimal Treatment Regimes, Self-triggered control of probabilistic Boolean control networks: a reinforcement learning approach, Model-free finite-horizon optimal tracking control of discrete-time linear systems, Lipschitzness is all you need to tame off-policy generative adversarial imitation learning, Planning for potential: efficient safe reinforcement learning, Reinforcement learning with algorithms from probabilistic structure estimation, Determination of optimal prevention strategy for COVID-19 based on multi-agent simulation, Concordance and value information criteria for optimal treatment decision, Dynamic portfolio choice: a simulation-and-regression approach, Learning to steer nonlinear interior-point methods, Max-plus approximation for reinforcement learning, On the convergence of reinforcement learning with Monte Carlo exploring starts, Exploration-exploitation in multi-agent learning: catastrophe theory meets game theory, The aircraft runway scheduling problem: a survey, Dominance, sharing, and assessment in an iterated hawk-dove game, Reinforcement learning for combinatorial optimization: a survey, A learning based algorithm for drone routing, Estimating permanent price impact via machine learning, Neural circuits for learning context-dependent associations of stimuli, Reinforcement learning enhanced multi-neighborhood tabu search for the max-mean dispersion problem, On the effect of probing noise in optimal control LQR via Q-learning using adaptive filtering algorithms, Habits as adaptations: an experimental study, An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning, Agent-based Modeling and Simulation of Competitive Wholesale Electricity Markets, New algorithms of the Q-learning type, Water reservoir control under economic, social and environmental constraints, Scalable attack on graph data by injecting vicious nodes, From model-based control to data-driven control: survey, classification and perspective, A Q-learning predictive control scheme with guaranteed stability, An online multi-agent co-operative learning algorithm in POMDPs, A Method to Effectively Detect Vulnerabilities on Path Planning of VIN, Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals, Application of reinforcement learning to the game of Othello, Matched Learning for Optimizing Individualized Treatment Strategies Using Electronic Health Records, Q-learning-based target selection for bearings-only autonomous navigation, A concentration bound for contractive stochastic approximation, Adaptive stepsize selection for tracking in a regime-switching environment, A tutorial on the cross-entropy method, Approximate dynamic programming-based approaches for input--output data-driven control of nonlinear processes, The asymptotic equipartition property in reinforcement learning and its relation to return maximization, Performance prediction of an unmanned airborne vehicle multi-agent system, On-policy concurrent reinforcement learning, Single-leader-multiple-follower games with boundedly rational agents, Structural estimation of real options models, Intelligent analysis of chaos roughness in regularity of walk for a two legged robot, Unnamed Item, Learning when to say no, TD(λ) learning without eligibility traces: a theoretical analysis, Learning nested agent models in an information economy, Model-based learning of interaction strategies in multi-agent systems, Learning dynamic prices in electronic retail markets with customer segmentation, A performance-centred approach to optimising maintenance of complex systems, An Approximate Dynamic Programming Algorithm for Monotone Value Functions, Efficient computation of optimal actions, Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states, Empirical Dynamic Programming, Fitted Q-iteration by functional networks for control problems, Active Localization of Multiple Targets from Noisy Relative Measurements, Closed-Loop Deep Learning: Generating Forward Models With Backpropagation, A reinforcement learning approach for dynamic multi-objective optimization, Convergence results on stochastic adaptive learning, Data-driven optimal control with a relaxed linear program, On testing conditional qualitative treatment effects, Q( $$\lambda $$ ) with Off-Policy Corrections, Adaptive cruise control via adaptive dynamic programming with experience replay, A deep reinforcement learning framework for continuous intraday market bidding, Reinforcement learning and stochastic optimisation, Learning to scan: a deep reinforcement learning approach for personalized scanning in CT imaging, Risk-averse autonomous systems: a brief history and recent developments from the perspective of optimal control, Deep reinforcement learning for \textsf{FlipIt} security game, Optimal control of a class of nonlinear stochastic systems, Learning COVID-19 mitigation strategies using reinforcement learning, Reinforcement learning for the knapsack problem, Demand sensing in \(e\)-business, Inhomogeneous deep Q-network for time sensitive applications, Q-learning-based model predictive variable impedance control for physical human-robot collaboration, Fundamental design principles for reinforcement learning algorithms, Dissipativity-based verification for autonomous systems in adversarial environments, Multi-agent reinforcement learning: a selective overview of theories and algorithms, A top-down approach to attain decentralized multi-agents, Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning, Reinforcement learning for multi-item retrieval in the puzzle-based storage system, A simulation-based approach to stochastic dynamic programming, Robotic dance modeling methods, Robust event-driven interactions in cooperative multi-agent learning, Improving branch-and-bound using decision diagrams and reinforcement learning, Generative methods for sampling transition paths in molecular dynamics, On the sample complexity of actor-critic method for reinforcement learning with function approximation, Certified reinforcement learning with logic guidance, Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality, Uncovering instabilities in variational-quantum deep Q-networks, Tutorial on Amortized Optimization, Model-free mean-field reinforcement learning: mean-field MDP and mean-field Q-learning, A self‐adaptive SAC‐PID control approach based on reinforcement learning for mobile robots, Transformation-Invariant Learning of Optimal Individualized Decision Rules with Time-to-Event Outcomes, A note on generalized second-order value iteration in Markov decision processes, Gradient temporal-difference learning for off-policy evaluation using emphatic weightings, Recent advances in reinforcement learning in finance, Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning, Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games, Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons, Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection, Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning, Data-based \(\mathcal{L}_2\) gain optimal control for discrete-time system with unknown dynamics, Flexible inference of optimal individualized treatment strategy in covariate adjusted randomization with multiple covariates, Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning, Optimal transmission scheduling for remote state estimation in CPSs with energy harvesting two-hop relay networks, A stochastic contraction mapping theorem, From Reinforcement Learning to Deep Reinforcement Learning: An Overview, A lexicographic optimization approach for a bi-objective parallel-machine scheduling problem minimizing total quality loss and total tardiness, Optimal liquidation through a limit order book: a neural network and simulation approach, Learning through imitation by using formal verification, Robust flipping stabilization of Boolean networks: a \(Q\)-learning approach, Approximated multi-agent fitted Q iteration, Robust estimation of heterogeneous treatment effects: an algorithm-based approach, Learning Non-monotone Optimal Individualized Treatment Regimes, Value Iteration is Optic Composition, Settling the sample complexity of model-based offline reinforcement learning, Evolving interpretable decision trees for reinforcement learning, Sliding mode based fault diagnosis with deep reinforcement learning add‐ons for intrinsically redundant manipulators, Underestimation estimators to Q-learning, Finite‐horizon H∞ tracking control for discrete‐time linear systems, Designing optimal, data-driven policies from multisite randomized trials, Independent learning in stochastic games, The synchronized ambient calculus



Cites Work