Simple statistical gradient-following algorithms for connectionist reinforcement learning
From MaRDI portal
Publication:1812928
DOI10.1007/BF00992696zbMath0772.68076WikidataQ39487141 ScholiaQ39487141MaRDI QIDQ1812928
Publication date: 11 August 1992
Published in: Machine Learning (Search for Journal in Brave)
Related Items (only showing first 100 items - show all)
A SELF-IMPROVING FUZZY CEREBELLAR MODEL ARTICULATION CONTROLLER WITH STOCHASTIC ACTION GENERATION ⋮ Semi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture search ⋮ Adaptive learning via selectionism and Bayesianism. I: Connection between the two ⋮ Environment-driven distributed evolutionary adaptation in a population of autonomous robotic agents ⋮ The factored policy-gradient planner ⋮ Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function ⋮ Learning to attend: modeling the shaping of selectivity in infero-temporal cortex in a categorization task ⋮ Learning flexible sensori-motor mappings in a complex network ⋮ Adaptive playouts for online learning of policies during Monte Carlo tree search ⋮ Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization ⋮ Continuous action set learning automata for stochastic optimization ⋮ Reinforcement learning for a biped robot based on a CPG-actor-critic method ⋮ Learning the travelling salesperson problem requires rethinking generalization ⋮ Unnamed Item ⋮ Bayesian Variational Inference for Exponential Random Graph Models ⋮ A study of mechanisms for improving robotic group performance ⋮ Stochastic Learning Approach for Binary Optimization: Application to Bayesian Optimal Design of Experiments ⋮ Learning to compute the metric dimension of graphs ⋮ A stochastic policy search model for matching behavior ⋮ Active inference and agency: optimal control without cost functions ⋮ Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Hybrid offline/online optimization for energy management via reinforcement learning ⋮ Model-based contextual policy search for data-efficient generalization of robot skills ⋮ A reinforcement learning approach to the orienteering problem with time windows ⋮ Reinforcement learning for combinatorial optimization: a survey ⋮ Supervised Visual Attention for Simultaneous Multimodal Machine Translation ⋮ Synaptic dynamics: linear model and adaptation algorithm ⋮ Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation ⋮ Immediate return preference emerged from a synaptic learning rule for return maximization ⋮ Two forms of immediate reward reinforcement learning for exploratory data analysis ⋮ Variance-constrained actor-critic algorithms for discounted and average reward MDPs ⋮ Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization ⋮ Rationalizing predictions by adversarial information calibration ⋮ A novel online gait optimization approach for biped robots with point-feet ⋮ Unnamed Item ⋮ Policy search for motor primitives in robotics ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Constructing effective personalized policies using counterfactual inference from biased data sets with many features ⋮ Dynamic graph conv-LSTM model with dynamic positional encoding for the large-scale traveling salesman problem ⋮ Preference-based reinforcement learning: a formal framework and a policy iteration algorithm ⋮ GSNs: generative stochastic networks ⋮ Analysis and improvement of policy gradient estimation ⋮ Unnamed Item ⋮ Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration ⋮ A Reward-Maximizing Spiking Neuron as a Bounded Rational Decision Maker ⋮ An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions ⋮ STDP-Compatible Approximation of Backpropagation in an Energy-Based Model ⋮ An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity ⋮ Autonomous reinforcement learning with experience replay ⋮ Approximate Bayesian model inversion for PDEs with heterogeneous and state-dependent coefficients ⋮ Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies ⋮ Deep Reinforcement Learning: A State-of-the-Art Walkthrough ⋮ A projected primal-dual gradient optimal control method for deep reinforcement learning ⋮ Set-to-Sequence Methods in Machine Learning: A Review ⋮ Deep reinforcement learning for the optimal placement of cryptocurrency limit orders ⋮ Enhance load forecastability: optimize data sampling policy by reinforcing user behaviors ⋮ Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes ⋮ A Learning Framework for Winner-Take-All Networks with Stochastic Synapses ⋮ Adaptive Learning Algorithm Convergence in Passive and Reactive Environments ⋮ Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint ⋮ Unnamed Item ⋮ A review on deep reinforcement learning for fluid mechanics ⋮ Risk-Constrained Reinforcement Learning with Percentile Risk Criteria ⋮ Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments ⋮ Node perturbation learning without noiseless baseline ⋮ Estimation of distributions involving unobservable events: the case of optimal search with unknown target distributions ⋮ Unnamed Item ⋮ Optimal node perturbation in linear perceptrons with uncertain eligibility trace ⋮ Model-based reinforcement learning with dimension reduction ⋮ Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm ⋮ Importance sampling in reinforcement learning with an estimated behavior policy ⋮ HNS: hierarchical negative sampling for network representation learning ⋮ Branes with brains: exploring string vacua with deep reinforcement learning ⋮ Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation ⋮ Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems ⋮ Compatible natural gradient policy search ⋮ TD-regularized actor-critic methods ⋮ Deep reinforcement learning for inventory control: a roadmap ⋮ Reinforcement learning in the brain ⋮ Varieties of Helmholtz machine ⋮ Risk-averse policy optimization via risk-neutral policy optimization ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Autonomous vehicle navigation using evolutionary reinforcement learning ⋮ Natural actor-critic algorithms ⋮ Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients ⋮ Measurement error models: from nonparametric methods to deep neural networks ⋮ Neural large neighborhood search for routing problems ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Opportunities for reinforcement learning in stochastic dynamic vehicle routing ⋮ Estimation and approximation bounds for gradient-based reinforcement learning ⋮ High generalization performance structured self-attention model for knapsack problem ⋮ Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes ⋮ Heavy-tails and randomized restarting beam search in goal-oriented neural sequence decoding
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Associative search network: A reinforcement learning associative memory
- An N-player sequential stochastic game with identical payoffs
- A new approach to the design of reinforcement schemes for learning automata
- Decentralized learning in finite Markov chains
- Pattern-recognizing stochastic learning automata
This page was built for publication: Simple statistical gradient-following algorithms for connectionist reinforcement learning