Simple statistical gradient-following algorithms for connectionist reinforcement learning

DOI10.1007/BF00992696MaRDI QIDQ1812928zbMATH OpenDBLPWikidataFDO

Authors Ronald J. Williams

Publication date 11 August 1992

Published in Machine Learning (Search for Journal in Brave)

zbMATH Keywords

reinforcement learning gradient descent connectionist networks

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Recommendations

A tutorial survey of reinforcement learning
\({\mathcal Q}\)-learning
Learning automata in feedforward connectionist systems
Reinforcement learning theory, algorithms and its application
Pattern-recognizing stochastic learning automata

Cites work

scientific article; zbMATH DE number 4066707 (Why is no real title available?)
scientific article; zbMATH DE number 3657150 (Why is no real title available?)
scientific article; zbMATH DE number 3551675 (Why is no real title available?)
A new approach to the design of reinforcement schemes for learning automata
An N-player sequential stochastic game with identical payoffs
Associative search network: A reinforcement learning associative memory
Decentralized learning in finite Markov chains
Pattern-recognizing stochastic learning automata

Cited in

(only showing first 100 items - show all)

A study of mechanisms for improving robotic group performance
Branes with brains: exploring string vacua with deep reinforcement learning
scientific article; zbMATH DE number 7370615 (Why is no real title available?)
An Introduction to Neural Data Compression
Model-based contextual policy search for data-efficient generalization of robot skills
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Estimation and approximation bounds for gradient-based reinforcement learning
Recurrent policy gradients
Bayesian Variational Inference for Exponential Random Graph Models
Model-based reinforcement learning with dimension reduction
scientific article; zbMATH DE number 1424385 (Why is no real title available?)
A tutorial survey of reinforcement learning
Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments
Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
Immediate return preference emerged from a synaptic learning rule for return maximization
Measurement error models: from nonparametric methods to deep neural networks
Synaptic dynamics: linear model and adaptation algorithm
Stochastic dynamics of reinforcement learning
Optimal node perturbation in linear perceptrons with uncertain eligibility trace
Multi-agent reinforcement learning aided sampling algorithms for a class of multiscale inverse problems
Node perturbation learning without noiseless baseline
TD-regularized actor-critic methods
Adaptive learning via selectionism and Bayesianism. I: Connection between the two
Autonomous vehicle navigation using evolutionary reinforcement learning
Zeroth-order optimization with orthogonal random directions
Reinforcement learning for combinatorial optimization: a survey
Importance sampling in reinforcement learning with an estimated behavior policy
From Reinforcement Learning to Deep Reinforcement Learning: An Overview
scientific article; zbMATH DE number 7370594 (Why is no real title available?)
HNS: hierarchical negative sampling for network representation learning
Varieties of Helmholtz machine
Deep reinforcement learning for option pricing and hedging under dynamic expectile risk measures
A stochastic policy search model for matching behavior
A two-step algorithm for learning from unspecific reinforcement
Pattern-recognizing stochastic learning automata
A projected primal-dual gradient optimal control method for deep reinforcement learning
Policy search for motor primitives in robotics
Natural actor-critic algorithms
Neural large neighborhood search for routing problems
Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
GSNs: generative stochastic networks
Multi-agent reinforcement learning: a selective overview of theories and algorithms
Reinforcement learning in the brain
Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning
Continuous action set learning automata for stochastic optimization
Natural reweighted wake-sleep
Nonconvex policy search using variational inequalities
Adaptive playouts for online learning of policies during Monte Carlo tree search
Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
Opportunities for reinforcement learning in stochastic dynamic vehicle routing
A SELF-IMPROVING FUZZY CEREBELLAR MODEL ARTICULATION CONTROLLER WITH STOCHASTIC ACTION GENERATION
Learning the travelling salesperson problem requires rethinking generalization
Estimation of distributions involving unobservable events: the case of optimal search with unknown target distributions
Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation
Two forms of immediate reward reinforcement learning for exploratory data analysis
Approximate Bayesian model inversion for PDEs with heterogeneous and state-dependent coefficients
Analysis and improvement of policy gradient estimation
Neural architecture search: a survey
Efficient sample reuse in policy gradients with parameter-based exploration
Compatible natural gradient policy search
Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function
scientific article; zbMATH DE number 6982909 (Why is no real title available?)
Mining gold from implicit models to improve likelihood-free inference
Learning flexible sensori-motor mappings in a complex network
The factored policy-gradient planner
Variance-constrained actor-critic algorithms for discounted and average reward MDPs
Autonomous reinforcement learning with experience replay
Active inference and agency: optimal control without cost functions
scientific article; zbMATH DE number 67800 (Why is no real title available?)
A reinforcement learning approach to the orienteering problem with time windows
A review on deep reinforcement learning for fluid mechanics
Adaptive learning algorithm convergence in passive and reactive environments
scientific article; zbMATH DE number 7306857 (Why is no real title available?)
Semi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture search
Reinforcement learning for a biped robot based on a CPG-actor-critic method
Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
Ancestral Gumbel-top-\(k\) sampling for sampling without replacement
Learning to compute the metric dimension of graphs
scientific article; zbMATH DE number 1708090 (Why is no real title available?)
Heavy-tails and randomized restarting beam search in goal-oriented neural sequence decoding
Model-based Reinforcement Learning: A Survey
Learning to attend: modeling the shaping of selectivity in infero-temporal cortex in a categorization task
Constructing effective personalized policies using counterfactual inference from biased data sets with many features
Solving the traveling salesperson problem with precedence constraints by deep reinforcement learning
Dynamic graph conv-LSTM model with dynamic positional encoding for the large-scale traveling salesman problem
A learning framework for winner-take-all networks with stochastic synapses
scientific article; zbMATH DE number 1966632 (Why is no real title available?)
Variational actor-critic algorithms,
Reconstruction of incomplete wildfire data using deep generative models
Greedy attack and Gumbel attack: generating adversarial examples for discrete data
\textsc{NeVAE}: a deep generative model for molecular graphs
Reinforcement learning theory, algorithms and its application
Deep reinforcement learning for inventory control: a roadmap
scientific article; zbMATH DE number 7370547 (Why is no real title available?)
Posterior weighted reinforcement learning with state uncertainty
Risk-averse policy optimization via risk-neutral policy optimization
Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization
Stochastic learning approach for binary optimization: application to Bayesian optimal design of experiments
Reinforcement learning in sparse-reward environments with hindsight policy gradients
Model selection in Bayesian neural networks via horseshoe priors

This page was built for publication: Simple statistical gradient-following algorithms for connectionist reinforcement learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1812928)