Simple statistical gradient-following algorithms for connectionist reinforcement learning

DOI10.1007/BF00992696MaRDI QIDQ1812928zbMATH OpenDBLPWikidataFDO

Authors Ronald J. Williams

Publication date 11 August 1992

Published in Machine Learning (Search for Journal in Brave)

zbMATH Keywords

reinforcement learning gradient descent connectionist networks

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Recommendations

A tutorial survey of reinforcement learning
\({\mathcal Q}\)-learning
Learning automata in feedforward connectionist systems
Reinforcement learning theory, algorithms and its application
Pattern-recognizing stochastic learning automata

Cites work

scientific article; zbMATH DE number 4066707 (Why is no real title available?)
scientific article; zbMATH DE number 3657150 (Why is no real title available?)
scientific article; zbMATH DE number 3551675 (Why is no real title available?)
A new approach to the design of reinforcement schemes for learning automata
An N-player sequential stochastic game with identical payoffs
Associative search network: A reinforcement learning associative memory
Decentralized learning in finite Markov chains
Pattern-recognizing stochastic learning automata

Cited in

(only showing first 100 items - show all)

Approximate Bayesian model inversion for PDEs with heterogeneous and state-dependent coefficients
Deep reinforcement learning for option pricing and hedging under dynamic expectile risk measures
Neural architecture search: a survey
Automated Deep Learning: Neural Architecture Search Is Not the End
Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge
Efficient multi-objective neural architecture search framework via policy gradient algorithm
Leveraging randomized smoothing for optimal control of nonsmooth dynamical systems
Differentiable particle filters with smoothly jittered resampling
Development of a machine learning-based design optimization method for crashworthiness analysis
Pattern-recognizing stochastic learning automata
Branes with brains: exploring string vacua with deep reinforcement learning
Ancestral Gumbel-top-\(k\) sampling for sampling without replacement
TD-regularized actor-critic methods
Geometry and convergence of natural policy gradient methods
Supervised Visual Attention for Simultaneous Multimodal Machine Translation
A differential Hebbian framework for biologically-plausible motor control
Knowledge graph embedding with shared latent semantic units
Variational actor-critic algorithms,
The factored policy-gradient planner
Estimation and approximation bounds for gradient-based reinforcement learning
Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization
Robust flow control and optimal sensor placement using deep reinforcement learning
Variance-constrained actor-critic algorithms for discounted and average reward MDPs
Learning flexible sensori-motor mappings in a complex network
Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments
Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
Heavy-tails and randomized restarting beam search in goal-oriented neural sequence decoding
Estimation of distributions involving unobservable events: the case of optimal search with unknown target distributions
Recent advances in reinforcement learning in finance
A novel online gait optimization approach for biped robots with point-feet
Adaptive learning via selectionism and Bayesianism. I: Connection between the two
Node perturbation learning without noiseless baseline
Semi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture search
Learn and route: learning implicit preferences for vehicle routing
Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function
Search-engine-augmented dialogue response generation with cheaply supervised query production
scientific article; zbMATH DE number 7306857 (Why is no real title available?)
Enhance load forecastability: optimize data sampling policy by reinforcing user behaviors
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint
Convergence of entropy-regularized natural policy gradient with linear function approximation
Reconstruction of incomplete wildfire data using deep generative models
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning
A reward-maximizing spiking neuron as a bounded rational decision maker
Mining gold from implicit models to improve likelihood-free inference
A tutorial survey of reinforcement learning
A two-step algorithm for learning from unspecific reinforcement
Nonconvex policy search using variational inequalities
Connecting stochastic optimal control and reinforcement learning
Deep learning in computational mechanics: a review
Approximate Newton Policy Gradient Algorithms
Employing reinforcement learning to enhance particle swarm optimization methods
Autonomous vehicle navigation using evolutionary reinforcement learning
Deep Reinforcement Learning: A State-of-the-Art Walkthrough
Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach
Adaptive playouts for online learning of policies during Monte Carlo tree search
Premium control with reinforcement learning
Active inference and agency: optimal control without cost functions
Solving the traveling salesperson problem with precedence constraints by deep reinforcement learning
Zeroth-order optimization with orthogonal random directions
A study of mechanisms for improving robotic group performance
Artificial intelligence for games
Reinforcement learning
HiAM: a hierarchical attention based model for knowledge graph multi-hop reasoning
Analysis and improvement of policy gradient estimation
Machine Learning: ECML 2004
STDP-compatible approximation of backpropagation in an energy-based model
Stochastic learning approach for binary optimization: application to Bayesian optimal design of experiments
Deep reinforcement learning for the optimal placement of cryptocurrency limit orders
From Reinforcement Learning to Deep Reinforcement Learning: An Overview
A review on deep reinforcement learning for fluid mechanics
Natural actor-critic algorithms
scientific article; zbMATH DE number 67800 (Why is no real title available?)
Continuous action set learning automata for stochastic optimization
Stochastic dynamics of reinforcement learning
Optimal node perturbation in linear perceptrons with uncertain eligibility trace
Non-parametric policy search with limited information loss
scientific article; zbMATH DE number 7370615 (Why is no real title available?)
Autonomous reinforcement learning with experience replay
scientific article; zbMATH DE number 7307467 (Why is no real title available?)
Attention-based exploitation and exploration strategy for multi-hop knowledge graph reasoning
Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
Immediate return preference emerged from a synaptic learning rule for return maximization
Synaptic dynamics: linear model and adaptation algorithm
Learning to attend: modeling the shaping of selectivity in infero-temporal cortex in a categorization task
Posterior weighted reinforcement learning with state uncertainty
Reinforcement learning for combinatorial optimization: a survey
scientific article; zbMATH DE number 7453114 (Why is no real title available?)
Set-to-Sequence Methods in Machine Learning: A Review
An Introduction to Neural Data Compression
Greedy attack and Gumbel attack: generating adversarial examples for discrete data
\textsc{NeVAE}: a deep generative model for molecular graphs
Softmax policy gradient methods can take exponential time to converge
Reinforcement learning for a biped robot based on a CPG-actor-critic method
Learning to compute the metric dimension of graphs
High generalization performance structured self-attention model for knapsack problem
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
Fast global convergence of natural policy gradient methods with entropy regularization
A learning framework for winner-take-all networks with stochastic synapses
Multi-agent reinforcement learning aided sampling algorithms for a class of multiscale inverse problems

This page was built for publication: Simple statistical gradient-following algorithms for connectionist reinforcement learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1812928)