\({\mathcal Q}\)-learning

From MaRDI portal
Revision as of 23:58, 29 January 2024 by Import240129110155 (talk | contribs) (Created automatically from import240129110155)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:1812931

DOI10.1007/BF00992698zbMath0773.68062WikidataQ57424214 ScholiaQ57424214MaRDI QIDQ1812931

Peter Dayan, Christopher J. C. H. Watkins

Publication date: 11 August 1992

Published in: Machine Learning (Search for Journal in Brave)




Related Items (only showing first 100 items - show all)

A time aggregation approach to Markov decision processesD-learning to estimate optimal individual treatment rulesTree-based reinforcement learning for estimating optimal dynamic treatment regimesSafe learning for near-optimal schedulingModel-free reinforcement learning for branching Markov decision processesOutput feedback Q-learning for discrete-time linear zero-sum games with application to the \(H_\infty\) controlImitation guided learning in learning classifier systemsA learning classifier system for mazes with aliasing clonesReliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value functionBounded rationality and search over small-world modelsA Markovian mechanism of proportional resource allocation in the incentive model as a dynamic stochastic inverse Stackelberg gameQ-learning agents in a Cournot oligopoly modelMultiscale Q-learning with linear function approximationHow hierarchical models improve point estimates of model parameters at the individual levelValue iteration and adaptive dynamic programming for data-driven adaptive optimal control designAutonomous agents modelling other agents: a comprehensive survey and open problemsIntelligent multiple search strategy cuckoo algorithm for numerical and engineering optimization problemsProbabilistic inference for determining options in reinforcement learningPerspectives of approximate dynamic programmingHigh-dimensional \(A\)-learning for optimal dynamic treatment regimesBifurcation mechanism design -- from optimal flat taxes to better cancer treatmentsHigh-dimensional inference for personalized treatment decisionLinear least-squares algorithms for temporal difference learningFeature-based methods for large scale dynamic programmingThe loss from imperfect value functions in exceptation-based and minimax-based tasksThe effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithmsActive inference and agency: optimal control without cost functionsTutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learningMachine learning in agent-based stochastic simulation: inferential theory and evaluation in transportation logisticsOpen problems in universal induction \& intelligenceDesigning time difference learning for interference management in heterogeneous networksMulti-objective optimization of water-using systemsCycle frequency in standard rock-paper-scissors games: evidence from experimental economicsLearning to compose fuzzy behaviors for autonomous agentsAn adaptive learning model with foregone payoff informationTesting probabilistic equivalence through reinforcement learningModel-based average reward reinforcement learningReinforcement learning-based control of drug dosing for cancer chemotherapy treatmentModel-based estimation of subjective values using choice tasks with probabilistic feedbackEnergy management for stationary electric energy storage systems: a systematic literature reviewA boundedness result for the direct heuristic dynamic programmingDesigning decentralized controllers for distributed-air-jet MEMS-based micromanipulators by reinforcement learningA human-robot collaborative reinforcement learning algorithmDistributed reinforcement learning for coordinate multi-robot foragingAutomatic generation of fuzzy inference systems via unsupervised learningError bounds for constant step-size \(Q\)-learningSensor-based learning for practical planning of fine motions in roboticsDynamic treatment regimes: technical challenges and applicationsThe relation between reinforcement learning parameters and the influence of reinforcement history on choice behaviorPermissive planning: Extending classical planning to uncertain task domains.Learning to compete, coordinate, and cooperate in repeated games using reinforcement learningThe optimal unbiased value estimator and its relation to LSTD, TD and MCNon-zero sum Nash Q-learning for unknown deterministic continuous-time linear systemsOffline reinforcement learning with task hierarchiesReinforcement learning algorithms with function approximation: recent advances and applicationsPreference-based reinforcement learning: a formal framework and a policy iteration algorithmTotally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domainsLyapunov stability-based control and identification of nonlinear dynamical systems using adaptive dynamic programmingQ-learning with censored dataImproving reinforcement learning by using sequence treesAdaptive dynamic programming and optimal control of nonlinear nonaffine systemsDecentralized reinforcement learning robust optimal tracking control for time varying constrained reconfigurable modular robot based on ACI and \(Q\)-functionApproximate policy iteration for dynamic resource-constrained project schedulingLinear programming formulation for non-stationary, finite-horizon Markov decision process modelsVariable selection for estimating the optimal treatment regimes in the presence of a large number of covariatesShape constraints in economics and operations researchMathematical properties of neuronal TD-rules and differential Hebbian learning: a comparisonRisk-sensitive reinforcement learning algorithms with generalized average criterionMulti-sensor transmission power control for remote estimation through a SINR-based communication channelA projected primal-dual gradient optimal control method for deep reinforcement learningLearning agents in an artificial power exchange: Tacit collusion, market power and efficiency of two double-auction mechanismsQ-learning algorithms with random truncation bounds and applications to effective parallel computingReinforcement learning: exploration-exploitation dilemma in multi-agent foraging taskData-based analysis of discrete-time linear systems in noisy environment: controllability and observabilityModel-free event-triggered control algorithm for continuous-time linear systems with optimal performanceLearning competitive pricing strategies by multi-agent reinforcement learningQ-learning for continuous-time linear systems: A model-free infinite horizon optimal control approachQualitative case-based reasoning and learningDecentralized reinforcement learning of robot behaviorsReinforcement learning endowed with safe veto policies to learn the control of linked-multicomponent robotic systemsLearning to bid in sequential Dutch auctionsDeep reinforcement learning with temporal logicsIntegral equations and machine learningFour encounters with system identificationIntegral \(Q\)-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systemsA behavioral learning process in gamesSampled fictitious play for approximate dynamic programmingBasic ideas for event-based optimization of Markov systemsReal-time reinforcement learning by sequential actor-critics and experience replayAlgebraic results and bottom-up algorithm for policies generalization in reinforcement learning using concept latticesSequential Advantage Selection for Optimal Treatment RegimesTransfer of learning by composing solutions of elemental sequential tasksIf multi-agent learning is the answer, what is the question?Free energy, value, and attractorsReinforcement distribution in fuzzy Q-learningCollective behavior of artificial intelligence population: transition from optimization to gameA theoretical analysis of temporal difference learning in the iterated prisoner's dilemma gameStochastic dynamic programming with factored representationsLearning fuzzy classifier systems for multi-agent coordination\(Q\)- and \(A\)-learning methods for estimating optimal dynamic treatment regimes



Cites Work




This page was built for publication: \({\mathcal Q}\)-learning