Practical issues in temporal difference learning
From MaRDI portal
Publication:1812929
DOI10.1007/BF00992697zbMATH Open0772.68075DBLPjournals/ml/Tesauro92WikidataQ56225518 ScholiaQ56225518MaRDI QIDQ1812929FDOQ1812929
Authors: Gerald Tesauro
Publication date: 11 August 1992
Published in: Machine Learning (Search for Journal in Brave)
Recommendations
- Artificial Intelligence and Soft Computing - ICAISC 2004
- AN ANALYSIS OF EXPERIENCE REPLAY IN TEMPORAL DIFFERENCE LEARNING
- TD(λ) learning without eligibility traces: a theoretical analysis
- Temporal difference learning applied to game playing and the results of application to Shogi
- The convergence of \(TD(\lambda)\) for general \(\lambda\)
Cites Work
- Learning representations by back-propagating errors
- A Stochastic Approximation Method
- On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
- Learnability and the Vapnik-Chervonenkis dimension
- Multilayer feedforward networks are universal approximators
- The convergence of \(TD(\lambda)\) for general \(\lambda\)
- A pattern classification approach to evaluation function learning
- A comparison and evaluation of three machine learning procedures as applied to the game of checkers
- A parallel network that learns to play backgammon
- On Optimal Doubling in Backgammon
Cited In (42)
- Learning long-term chess strategies from databases
- TD(λ) learning without eligibility traces: a theoretical analysis
- Reinforcement Learning, Bit by Bit
- Approximate dynamic programming for stochastic \(N\)-stage optimization with application to optimal consumption under uncertainty
- Games, computers, and artificial intelligence
- Programming backgammon using self-teaching neural nets
- New Versions of Gradient Temporal-Difference Learning
- An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification
- Bayesian exploration for approximate dynamic programming
- A tutorial survey of reinforcement learning
- Many-Layered Learning
- A brief survey on nonlinear control using adaptive dynamic programming under engineering-oriented complexities
- Dimension reduction based adaptive dynamic programming for optimal control of discrete-time nonlinear control-affine systems
- Two steps reinforcement learning
- Approximate dynamic programming via direct search in the space of value function approximations
- Artificial Intelligence and Soft Computing - ICAISC 2004
- The loss from imperfect value functions in exceptation-based and minimax-based tasks
- SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING
- Model-based average reward reinforcement learning
- Deep learning of support vector machines with class probability output networks
- Reinforcement learning with replacing eligibility traces
- Computer Go: An AI oriented survey
- Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
- Learning metric-topological maps for indoor mobile robot navigation
- A theoretical analysis of temporal difference learning in the iterated prisoner's dilemma game
- Hyperbolically Discounted Temporal Difference Learning
- Two-agent IDA*
- Feature-based methods for large scale dynamic programming
- A reinforcement learning approach for dynamic multi-objective optimization
- Error controlled actor-critic
- Reinforcement learning of non-Markov decision processes
- Eligibility traces and forgetting factor in recursive least-squares-based temporal difference
- PLAYER CO-MODELLING IN A STRATEGY BOARD GAME: DISCOVERING HOW TO PLAY FAST
- Cooperation of categorical and behavioral learning in a practical solution to the abstraction problem
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- An approximate dynamic programming method for multi-input multi-output nonlinear system
- Deep reinforcement learning for inventory control: a roadmap
- A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
- Approximate policy iteration: a survey and some new methods
- Linear least-squares algorithms for temporal difference learning
- Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results
- Optimal liquidation through a limit order book: a neural network and simulation approach
This page was built for publication: Practical issues in temporal difference learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1812929)