Simple statistical gradient-following algorithms for connectionist reinforcement learning
From MaRDI portal
Publication:1812928
DOI10.1007/BF00992696zbMATH Open0772.68076DBLPjournals/ml/Williams92WikidataQ39487141 ScholiaQ39487141MaRDI QIDQ1812928FDOQ1812928
Authors: Ronald J. Williams
Publication date: 11 August 1992
Published in: Machine Learning (Search for Journal in Brave)
Recommendations
Cites Work
- Pattern-recognizing stochastic learning automata
- A new approach to the design of reinforcement schemes for learning automata
- Decentralized learning in finite Markov chains
- Title not available (Why is that?)
- Title not available (Why is that?)
- Associative search network: A reinforcement learning associative memory
- Title not available (Why is that?)
- An N-player sequential stochastic game with identical payoffs
Cited In (only showing first 100 items - show all)
- Pattern-recognizing stochastic learning automata
- Branes with brains: exploring string vacua with deep reinforcement learning
- TD-regularized actor-critic methods
- The factored policy-gradient planner
- Estimation and approximation bounds for gradient-based reinforcement learning
- Learning flexible sensori-motor mappings in a complex network
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs
- Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments
- Estimation of distributions involving unobservable events: the case of optimal search with unknown target distributions
- Node perturbation learning without noiseless baseline
- Adaptive learning via selectionism and Bayesianism. I: Connection between the two
- Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function
- Mining gold from implicit models to improve likelihood-free inference
- Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
- A tutorial survey of reinforcement learning
- Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning
- Nonconvex policy search using variational inequalities
- Zeroth-order optimization with orthogonal random directions
- Autonomous vehicle navigation using evolutionary reinforcement learning
- Adaptive playouts for online learning of policies during Monte Carlo tree search
- Active inference and agency: optimal control without cost functions
- A study of mechanisms for improving robotic group performance
- From Reinforcement Learning to Deep Reinforcement Learning: An Overview
- Analysis and improvement of policy gradient estimation
- Title not available (Why is that?)
- Stochastic dynamics of reinforcement learning
- Natural actor-critic algorithms
- Continuous action set learning automata for stochastic optimization
- Title not available (Why is that?)
- Optimal node perturbation in linear perceptrons with uncertain eligibility trace
- Autonomous reinforcement learning with experience replay
- Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
- Immediate return preference emerged from a synaptic learning rule for return maximization
- Synaptic dynamics: linear model and adaptation algorithm
- An Introduction to Neural Data Compression
- Reinforcement learning for combinatorial optimization: a survey
- Multi-agent reinforcement learning aided sampling algorithms for a class of multiscale inverse problems
- Title not available (Why is that?)
- Policy search for motor primitives in robotics
- A SELF-IMPROVING FUZZY CEREBELLAR MODEL ARTICULATION CONTROLLER WITH STOCHASTIC ACTION GENERATION
- Two forms of immediate reward reinforcement learning for exploratory data analysis
- Importance sampling in reinforcement learning with an estimated behavior policy
- HNS: hierarchical negative sampling for network representation learning
- Natural reweighted wake-sleep
- Measurement error models: from nonparametric methods to deep neural networks
- Title not available (Why is that?)
- Model-based reinforcement learning with dimension reduction
- A stochastic policy search model for matching behavior
- Efficient sample reuse in policy gradients with parameter-based exploration
- Recurrent policy gradients
- Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation
- Model selection in Bayesian neural networks via horseshoe priors
- GSNs: generative stochastic networks
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- Reinforcement learning in the brain
- Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
- Varieties of Helmholtz machine
- Opportunities for reinforcement learning in stochastic dynamic vehicle routing
- Model-based contextual policy search for data-efficient generalization of robot skills
- Compatible natural gradient policy search
- Title not available (Why is that?)
- Neural large neighborhood search for routing problems
- Bayesian Variational Inference for Exponential Random Graph Models
- A projected primal-dual gradient optimal control method for deep reinforcement learning
- Learning the travelling salesperson problem requires rethinking generalization
- Deep reinforcement learning for option pricing and hedging under dynamic expectile risk measures
- Neural architecture search: a survey
- Approximate Bayesian model inversion for PDEs with heterogeneous and state-dependent coefficients
- Constructing effective personalized policies using counterfactual inference from biased data sets with many features
- Automated Deep Learning: Neural Architecture Search Is Not the End
- Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge
- Efficient multi-objective neural architecture search framework via policy gradient algorithm
- Leveraging randomized smoothing for optimal control of nonsmooth dynamical systems
- Differentiable particle filters with smoothly jittered resampling
- Development of a machine learning-based design optimization method for crashworthiness analysis
- Ancestral Gumbel-top-\(k\) sampling for sampling without replacement
- Geometry and convergence of natural policy gradient methods
- A differential Hebbian framework for biologically-plausible motor control
- Knowledge graph embedding with shared latent semantic units
- Supervised Visual Attention for Simultaneous Multimodal Machine Translation
- Variational actor-critic algorithms,
- Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization
- Robust flow control and optimal sensor placement using deep reinforcement learning
- Recent advances in reinforcement learning in finance
- Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
- Heavy-tails and randomized restarting beam search in goal-oriented neural sequence decoding
- A novel online gait optimization approach for biped robots with point-feet
- Learn and route: learning implicit preferences for vehicle routing
- Semi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture search
- Search-engine-augmented dialogue response generation with cheaply supervised query production
- Convergence of entropy-regularized natural policy gradient with linear function approximation
- Title not available (Why is that?)
- Reconstruction of incomplete wildfire data using deep generative models
- Enhance load forecastability: optimize data sampling policy by reinforcing user behaviors
- Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint
- A reward-maximizing spiking neuron as a bounded rational decision maker
- Connecting stochastic optimal control and reinforcement learning
- Deep learning in computational mechanics: a review
- Approximate Newton Policy Gradient Algorithms
This page was built for publication: Simple statistical gradient-following algorithms for connectionist reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1812928)