\({\mathcal Q}\)-learning

From MaRDI portal

Revision as of 23:58, 29 January 2024 by Import240129110155 (talk | contribs) (Created automatically from import240129110155)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:1812931

Jump to:navigation, search

DOI10.1007/BF00992698zbMath0773.68062WikidataQ57424214 ScholiaQ57424214MaRDI QIDQ1812931

Peter Dayan, Christopher J. C. H. Watkins

Publication date: 11 August 1992

Published in: Machine Learning (Search for Journal in Brave)

zbMATH Keywords

reinforcement learning temporal differences \({\mathcal Q}\)-learning asynchronous dynamic programming

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items (only showing first 100 items - show all)

A time aggregation approach to Markov decision processes ⋮ D-learning to estimate optimal individual treatment rules ⋮ Tree-based reinforcement learning for estimating optimal dynamic treatment regimes ⋮ Safe learning for near-optimal scheduling ⋮ Model-free reinforcement learning for branching Markov decision processes ⋮ Output feedback Q-learning for discrete-time linear zero-sum games with application to the \(H_\infty\) control ⋮ Imitation guided learning in learning classifier systems ⋮ A learning classifier system for mazes with aliasing clones ⋮ Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function ⋮ Bounded rationality and search over small-world models ⋮ A Markovian mechanism of proportional resource allocation in the incentive model as a dynamic stochastic inverse Stackelberg game ⋮ Q-learning agents in a Cournot oligopoly model ⋮ Multiscale Q-learning with linear function approximation ⋮ How hierarchical models improve point estimates of model parameters at the individual level ⋮ Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design ⋮ Autonomous agents modelling other agents: a comprehensive survey and open problems ⋮ Intelligent multiple search strategy cuckoo algorithm for numerical and engineering optimization problems ⋮ Probabilistic inference for determining options in reinforcement learning ⋮ Perspectives of approximate dynamic programming ⋮ High-dimensional \(A\)-learning for optimal dynamic treatment regimes ⋮ Bifurcation mechanism design -- from optimal flat taxes to better cancer treatments ⋮ High-dimensional inference for personalized treatment decision ⋮ Linear least-squares algorithms for temporal difference learning ⋮ Feature-based methods for large scale dynamic programming ⋮ The loss from imperfect value functions in exceptation-based and minimax-based tasks ⋮ The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms ⋮ Active inference and agency: optimal control without cost functions ⋮ Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning ⋮ Machine learning in agent-based stochastic simulation: inferential theory and evaluation in transportation logistics ⋮ Open problems in universal induction \& intelligence ⋮ Designing time difference learning for interference management in heterogeneous networks ⋮ Multi-objective optimization of water-using systems ⋮ Cycle frequency in standard rock-paper-scissors games: evidence from experimental economics ⋮ Learning to compose fuzzy behaviors for autonomous agents ⋮ An adaptive learning model with foregone payoff information ⋮ Testing probabilistic equivalence through reinforcement learning ⋮ Model-based average reward reinforcement learning ⋮ Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment ⋮ Model-based estimation of subjective values using choice tasks with probabilistic feedback ⋮ Energy management for stationary electric energy storage systems: a systematic literature review ⋮ A boundedness result for the direct heuristic dynamic programming ⋮ Designing decentralized controllers for distributed-air-jet MEMS-based micromanipulators by reinforcement learning ⋮ A human-robot collaborative reinforcement learning algorithm ⋮ Distributed reinforcement learning for coordinate multi-robot foraging ⋮ Automatic generation of fuzzy inference systems via unsupervised learning ⋮ Error bounds for constant step-size \(Q\)-learning ⋮ Sensor-based learning for practical planning of fine motions in robotics ⋮ Dynamic treatment regimes: technical challenges and applications ⋮ The relation between reinforcement learning parameters and the influence of reinforcement history on choice behavior ⋮ Permissive planning: Extending classical planning to uncertain task domains. ⋮ Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning ⋮ The optimal unbiased value estimator and its relation to LSTD, TD and MC ⋮ Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems ⋮ Offline reinforcement learning with task hierarchies ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Preference-based reinforcement learning: a formal framework and a policy iteration algorithm ⋮ Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains ⋮ Lyapunov stability-based control and identification of nonlinear dynamical systems using adaptive dynamic programming ⋮ Q-learning with censored data ⋮ Improving reinforcement learning by using sequence trees ⋮ Adaptive dynamic programming and optimal control of nonlinear nonaffine systems ⋮ Decentralized reinforcement learning robust optimal tracking control for time varying constrained reconfigurable modular robot based on ACI and \(Q\)-function ⋮ Approximate policy iteration for dynamic resource-constrained project scheduling ⋮ Linear programming formulation for non-stationary, finite-horizon Markov decision process models ⋮ Variable selection for estimating the optimal treatment regimes in the presence of a large number of covariates ⋮ Shape constraints in economics and operations research ⋮ Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison ⋮ Risk-sensitive reinforcement learning algorithms with generalized average criterion ⋮ Multi-sensor transmission power control for remote estimation through a SINR-based communication channel ⋮ A projected primal-dual gradient optimal control method for deep reinforcement learning ⋮ Learning agents in an artificial power exchange: Tacit collusion, market power and efficiency of two double-auction mechanisms ⋮ Q-learning algorithms with random truncation bounds and applications to effective parallel computing ⋮ Reinforcement learning: exploration-exploitation dilemma in multi-agent foraging task ⋮ Data-based analysis of discrete-time linear systems in noisy environment: controllability and observability ⋮ Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance ⋮ Learning competitive pricing strategies by multi-agent reinforcement learning ⋮ Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach ⋮ Qualitative case-based reasoning and learning ⋮ Decentralized reinforcement learning of robot behaviors ⋮ Reinforcement learning endowed with safe veto policies to learn the control of linked-multicomponent robotic systems ⋮ Learning to bid in sequential Dutch auctions ⋮ Deep reinforcement learning with temporal logics ⋮ Integral equations and machine learning ⋮ Four encounters with system identification ⋮ Integral \(Q\)-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems ⋮ A behavioral learning process in games ⋮ Sampled fictitious play for approximate dynamic programming ⋮ Basic ideas for event-based optimization of Markov systems ⋮ Real-time reinforcement learning by sequential actor-critics and experience replay ⋮ Algebraic results and bottom-up algorithm for policies generalization in reinforcement learning using concept lattices ⋮ Sequential Advantage Selection for Optimal Treatment Regimes ⋮ Transfer of learning by composing solutions of elemental sequential tasks ⋮ If multi-agent learning is the answer, what is the question? ⋮ Free energy, value, and attractors ⋮ Reinforcement distribution in fuzzy Q-learning ⋮ Collective behavior of artificial intelligence population: transition from optimization to game ⋮ A theoretical analysis of temporal difference learning in the iterated prisoner's dilemma game ⋮ Stochastic dynamic programming with factored representations ⋮ Learning fuzzy classifier systems for multi-agent coordination ⋮ \(Q\)- and \(A\)-learning methods for estimating optimal dynamic treatment regimes

Cites Work

This page was built for publication: \({\mathcal Q}\)-learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1812931&oldid=12039128"