Simple statistical gradient-following algorithms for connectionist reinforcement learning
From MaRDI portal
Recommendations
Cites work
- scientific article; zbMATH DE number 4066707 (Why is no real title available?)
- scientific article; zbMATH DE number 3657150 (Why is no real title available?)
- scientific article; zbMATH DE number 3551675 (Why is no real title available?)
- A new approach to the design of reinforcement schemes for learning automata
- An N-player sequential stochastic game with identical payoffs
- Associative search network: A reinforcement learning associative memory
- Decentralized learning in finite Markov chains
- Pattern-recognizing stochastic learning automata
Cited in
(only showing first 100 items - show all)- A study of mechanisms for improving robotic group performance
- Branes with brains: exploring string vacua with deep reinforcement learning
- scientific article; zbMATH DE number 7370615 (Why is no real title available?)
- An Introduction to Neural Data Compression
- Model-based contextual policy search for data-efficient generalization of robot skills
- Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
- Estimation and approximation bounds for gradient-based reinforcement learning
- Recurrent policy gradients
- Bayesian Variational Inference for Exponential Random Graph Models
- Model-based reinforcement learning with dimension reduction
- scientific article; zbMATH DE number 1424385 (Why is no real title available?)
- A tutorial survey of reinforcement learning
- Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments
- Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
- Immediate return preference emerged from a synaptic learning rule for return maximization
- Measurement error models: from nonparametric methods to deep neural networks
- Synaptic dynamics: linear model and adaptation algorithm
- Stochastic dynamics of reinforcement learning
- Optimal node perturbation in linear perceptrons with uncertain eligibility trace
- Multi-agent reinforcement learning aided sampling algorithms for a class of multiscale inverse problems
- Node perturbation learning without noiseless baseline
- TD-regularized actor-critic methods
- Adaptive learning via selectionism and Bayesianism. I: Connection between the two
- Autonomous vehicle navigation using evolutionary reinforcement learning
- Zeroth-order optimization with orthogonal random directions
- Reinforcement learning for combinatorial optimization: a survey
- Importance sampling in reinforcement learning with an estimated behavior policy
- From Reinforcement Learning to Deep Reinforcement Learning: An Overview
- scientific article; zbMATH DE number 7370594 (Why is no real title available?)
- HNS: hierarchical negative sampling for network representation learning
- Varieties of Helmholtz machine
- Deep reinforcement learning for option pricing and hedging under dynamic expectile risk measures
- A stochastic policy search model for matching behavior
- A two-step algorithm for learning from unspecific reinforcement
- Pattern-recognizing stochastic learning automata
- A projected primal-dual gradient optimal control method for deep reinforcement learning
- Policy search for motor primitives in robotics
- Natural actor-critic algorithms
- Neural large neighborhood search for routing problems
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- GSNs: generative stochastic networks
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- Reinforcement learning in the brain
- Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning
- Continuous action set learning automata for stochastic optimization
- Natural reweighted wake-sleep
- Nonconvex policy search using variational inequalities
- Adaptive playouts for online learning of policies during Monte Carlo tree search
- Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
- Opportunities for reinforcement learning in stochastic dynamic vehicle routing
- A SELF-IMPROVING FUZZY CEREBELLAR MODEL ARTICULATION CONTROLLER WITH STOCHASTIC ACTION GENERATION
- Learning the travelling salesperson problem requires rethinking generalization
- Estimation of distributions involving unobservable events: the case of optimal search with unknown target distributions
- Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation
- Two forms of immediate reward reinforcement learning for exploratory data analysis
- Approximate Bayesian model inversion for PDEs with heterogeneous and state-dependent coefficients
- Analysis and improvement of policy gradient estimation
- Neural architecture search: a survey
- Efficient sample reuse in policy gradients with parameter-based exploration
- Compatible natural gradient policy search
- Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function
- scientific article; zbMATH DE number 6982909 (Why is no real title available?)
- Mining gold from implicit models to improve likelihood-free inference
- Learning flexible sensori-motor mappings in a complex network
- The factored policy-gradient planner
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs
- Autonomous reinforcement learning with experience replay
- Active inference and agency: optimal control without cost functions
- scientific article; zbMATH DE number 67800 (Why is no real title available?)
- A reinforcement learning approach to the orienteering problem with time windows
- A review on deep reinforcement learning for fluid mechanics
- Adaptive learning algorithm convergence in passive and reactive environments
- scientific article; zbMATH DE number 7306857 (Why is no real title available?)
- Semi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture search
- Reinforcement learning for a biped robot based on a CPG-actor-critic method
- Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
- Ancestral Gumbel-top-\(k\) sampling for sampling without replacement
- Learning to compute the metric dimension of graphs
- scientific article; zbMATH DE number 1708090 (Why is no real title available?)
- Heavy-tails and randomized restarting beam search in goal-oriented neural sequence decoding
- Model-based Reinforcement Learning: A Survey
- Learning to attend: modeling the shaping of selectivity in infero-temporal cortex in a categorization task
- Constructing effective personalized policies using counterfactual inference from biased data sets with many features
- Solving the traveling salesperson problem with precedence constraints by deep reinforcement learning
- Dynamic graph conv-LSTM model with dynamic positional encoding for the large-scale traveling salesman problem
- A learning framework for winner-take-all networks with stochastic synapses
- scientific article; zbMATH DE number 1966632 (Why is no real title available?)
- Variational actor-critic algorithms,
- Reconstruction of incomplete wildfire data using deep generative models
- Greedy attack and Gumbel attack: generating adversarial examples for discrete data
- \textsc{NeVAE}: a deep generative model for molecular graphs
- Reinforcement learning theory, algorithms and its application
- Deep reinforcement learning for inventory control: a roadmap
- scientific article; zbMATH DE number 7370547 (Why is no real title available?)
- Posterior weighted reinforcement learning with state uncertainty
- Risk-averse policy optimization via risk-neutral policy optimization
- Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization
- Stochastic learning approach for binary optimization: application to Bayesian optimal design of experiments
- Reinforcement learning in sparse-reward environments with hindsight policy gradients
- Model selection in Bayesian neural networks via horseshoe priors
This page was built for publication: Simple statistical gradient-following algorithms for connectionist reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1812928)