Simple statistical gradient-following algorithms for connectionist reinforcement learning
From MaRDI portal
(Redirected from Publication:1812928)
Recommendations
Cites work
- scientific article; zbMATH DE number 4066707 (Why is no real title available?)
- scientific article; zbMATH DE number 3657150 (Why is no real title available?)
- scientific article; zbMATH DE number 3551675 (Why is no real title available?)
- A new approach to the design of reinforcement schemes for learning automata
- An N-player sequential stochastic game with identical payoffs
- Associative search network: A reinforcement learning associative memory
- Decentralized learning in finite Markov chains
- Pattern-recognizing stochastic learning automata
Cited in
(only showing first 100 items - show all)- Policy search for motor primitives in robotics
- Reward prediction errors, not sensory prediction errors, play a major role in model selection in human reinforcement learning
- DAPath: distance-aware knowledge graph reasoning based on deep reinforcement learning
- Two forms of immediate reward reinforcement learning for exploratory data analysis
- scientific article; zbMATH DE number 7370594 (Why is no real title available?)
- Hybrid offline/online optimization for energy management via reinforcement learning
- A reinforcement learning approach to the orienteering problem with time windows
- A SELF-IMPROVING FUZZY CEREBELLAR MODEL ARTICULATION CONTROLLER WITH STOCHASTIC ACTION GENERATION
- Importance sampling in reinforcement learning with an estimated behavior policy
- HNS: hierarchical negative sampling for network representation learning
- Accelerating Primal-Dual Methods for Regularized Markov Decision Processes
- Consolidation of structure of high noise data by a new noise index and reinforcement learning
- Measurement error models: from nonparametric methods to deep neural networks
- Natural reweighted wake-sleep
- A stochastic policy search model for matching behavior
- Environment-driven distributed evolutionary adaptation in a population of autonomous robotic agents
- Model-based reinforcement learning with dimension reduction
- scientific article; zbMATH DE number 1708090 (Why is no real title available?)
- scientific article; zbMATH DE number 1424385 (Why is no real title available?)
- Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information
- Graph Neural Networks for Natural Language Processing: A Survey
- Recurrent policy gradients
- Efficient sample reuse in policy gradients with parameter-based exploration
- Rationalizing predictions by adversarial information calibration
- Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation
- Elman Backpropagation as Reinforcement for Simple Recurrent Networks
- Scaling up stochastic gradient descent for non-convex optimisation
- Adaptive learning algorithm convergence in passive and reactive environments
- Smoothing policies and safe policy gradients
- Model-based Reinforcement Learning: A Survey
- Occupancy information ratio: infinite-horizon, information-directed, parameterized policy search
- GSNs: generative stochastic networks
- The evolutionary dynamics of soft-max policy gradient in multi-agent settings
- An attention model for the formation of collectives in real-world domains
- Almost surely safe exploration and exploitation for deep reinforcement learning with state safety estimation
- Policy search for active fault diagnosis with partially observable state
- Model selection in Bayesian neural networks via horseshoe priors
- Reinforcement learning in the brain
- Dynamic graph conv-LSTM model with dynamic positional encoding for the large-scale traveling salesman problem
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
- scientific article; zbMATH DE number 1966632 (Why is no real title available?)
- Reinforcement learning in sparse-reward environments with hindsight policy gradients
- scientific article; zbMATH DE number 7625182 (Why is no real title available?)
- The first AI4TSP competition: learning to solve stochastic routing problems
- Risk-constrained reinforcement learning with percentile risk criteria
- Varieties of Helmholtz machine
- An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity
- Opportunities for reinforcement learning in stochastic dynamic vehicle routing
- Model-based contextual policy search for data-efficient generalization of robot skills
- Deep reinforcement learning for inventory control: a roadmap
- Risk-averse policy optimization via risk-neutral policy optimization
- ARL: analogical reinforcement learning for knowledge graph reasoning
- scientific article; zbMATH DE number 7370547 (Why is no real title available?)
- Compatible natural gradient policy search
- scientific article; zbMATH DE number 6982909 (Why is no real title available?)
- Neural large neighborhood search for routing problems
- Full gradient DQN reinforcement learning: a provably convergent scheme
- Actor prioritized experience replay
- Bayesian Variational Inference for Exponential Random Graph Models
- Learning automata algorithms for pattern classification.
- Gaussian variational approximations for high-dimensional state space models
- Global convergence of policy gradient methods to (almost) locally optimal policies
- A projected primal-dual gradient optimal control method for deep reinforcement learning
- Dynamic neural Turing machine with continuous and discrete addressing schemes
- Reinforcement learning theory, algorithms and its application
- Personalized dynamic treatment regimes in continuous time: a Bayesian approach for optimizing clinical decisions with timing
- Learning the travelling salesperson problem requires rethinking generalization
- Optimal differentiated threshold characterization for multi-task stochastic deadline scheduling with queuing
- An RNN-policy gradient approach for quantum architecture search
- Task-aware verifiable RNN-based policies for partially observable Markov decision processes
- Constructing effective personalized policies using counterfactual inference from biased data sets with many features
- An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
- Approximate Bayesian model inversion for PDEs with heterogeneous and state-dependent coefficients
- Deep reinforcement learning for option pricing and hedging under dynamic expectile risk measures
- Neural architecture search: a survey
- Automated Deep Learning: Neural Architecture Search Is Not the End
- Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge
- Efficient multi-objective neural architecture search framework via policy gradient algorithm
- Leveraging randomized smoothing for optimal control of nonsmooth dynamical systems
- Differentiable particle filters with smoothly jittered resampling
- Development of a machine learning-based design optimization method for crashworthiness analysis
- Pattern-recognizing stochastic learning automata
- Branes with brains: exploring string vacua with deep reinforcement learning
- Ancestral Gumbel-top-\(k\) sampling for sampling without replacement
- TD-regularized actor-critic methods
- Geometry and convergence of natural policy gradient methods
- Supervised Visual Attention for Simultaneous Multimodal Machine Translation
- A differential Hebbian framework for biologically-plausible motor control
- Knowledge graph embedding with shared latent semantic units
- Variational actor-critic algorithms,
- The factored policy-gradient planner
- Estimation and approximation bounds for gradient-based reinforcement learning
- Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization
- Robust flow control and optimal sensor placement using deep reinforcement learning
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs
- Learning flexible sensori-motor mappings in a complex network
- Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments
- Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
This page was built for publication: Simple statistical gradient-following algorithms for connectionist reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1812928)