Simple statistical gradient-following algorithms for connectionist reinforcement learning

From MaRDI portal
Publication:1812928

DOI10.1007/BF00992696zbMath0772.68076WikidataQ39487141 ScholiaQ39487141MaRDI QIDQ1812928

Ronald J. Williams

Publication date: 11 August 1992

Published in: Machine Learning (Search for Journal in Brave)




Related Items (only showing first 100 items - show all)

A SELF-IMPROVING FUZZY CEREBELLAR MODEL ARTICULATION CONTROLLER WITH STOCHASTIC ACTION GENERATIONSemi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture searchAdaptive learning via selectionism and Bayesianism. I: Connection between the twoEnvironment-driven distributed evolutionary adaptation in a population of autonomous robotic agentsThe factored policy-gradient plannerReliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value functionLearning to attend: modeling the shaping of selectivity in infero-temporal cortex in a categorization taskLearning flexible sensori-motor mappings in a complex networkAdaptive playouts for online learning of policies during Monte Carlo tree searchActor-Critic–Like Stochastic Adaptive Search for Continuous Simulation OptimizationContinuous action set learning automata for stochastic optimizationReinforcement learning for a biped robot based on a CPG-actor-critic methodLearning the travelling salesperson problem requires rethinking generalizationUnnamed ItemBayesian Variational Inference for Exponential Random Graph ModelsA study of mechanisms for improving robotic group performanceStochastic Learning Approach for Binary Optimization: Application to Bayesian Optimal Design of ExperimentsLearning to compute the metric dimension of graphsA stochastic policy search model for matching behaviorActive inference and agency: optimal control without cost functionsTutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learningUnnamed ItemUnnamed ItemUnnamed ItemHybrid offline/online optimization for energy management via reinforcement learningModel-based contextual policy search for data-efficient generalization of robot skillsA reinforcement learning approach to the orienteering problem with time windowsReinforcement learning for combinatorial optimization: a surveySupervised Visual Attention for Simultaneous Multimodal Machine TranslationSynaptic dynamics: linear model and adaptation algorithmModel-based policy gradients with parameter-based exploration by least-squares conditional density estimationImmediate return preference emerged from a synaptic learning rule for return maximizationTwo forms of immediate reward reinforcement learning for exploratory data analysisVariance-constrained actor-critic algorithms for discounted and average reward MDPsFast Global Convergence of Natural Policy Gradient Methods with Entropy RegularizationRationalizing predictions by adversarial information calibrationA novel online gait optimization approach for biped robots with point-feetUnnamed ItemPolicy search for motor primitives in roboticsUnnamed ItemUnnamed ItemConstructing effective personalized policies using counterfactual inference from biased data sets with many featuresDynamic graph conv-LSTM model with dynamic positional encoding for the large-scale traveling salesman problemPreference-based reinforcement learning: a formal framework and a policy iteration algorithmGSNs: generative stochastic networksAnalysis and improvement of policy gradient estimationUnnamed ItemEfficient Sample Reuse in Policy Gradients with Parameter-Based ExplorationA Reward-Maximizing Spiking Neuron as a Bounded Rational Decision MakerAn Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and ActionsSTDP-Compatible Approximation of Backpropagation in an Energy-Based ModelAn Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic PlasticityAutonomous reinforcement learning with experience replayApproximate Bayesian model inversion for PDEs with heterogeneous and state-dependent coefficientsGlobal Convergence of Policy Gradient Methods to (Almost) Locally Optimal PoliciesDeep Reinforcement Learning: A State-of-the-Art WalkthroughA projected primal-dual gradient optimal control method for deep reinforcement learningSet-to-Sequence Methods in Machine Learning: A ReviewDeep reinforcement learning for the optimal placement of cryptocurrency limit ordersEnhance load forecastability: optimize data sampling policy by reinforcing user behaviorsDynamic Neural Turing Machine with Continuous and Discrete Addressing SchemesA Learning Framework for Winner-Take-All Networks with Stochastic SynapsesAdaptive Learning Algorithm Convergence in Passive and Reactive EnvironmentsSmoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpointUnnamed ItemA review on deep reinforcement learning for fluid mechanicsRisk-Constrained Reinforcement Learning with Percentile Risk CriteriaNovelty detection improves performance of reinforcement learners in fluctuating, partially observable environmentsNode perturbation learning without noiseless baselineEstimation of distributions involving unobservable events: the case of optimal search with unknown target distributionsUnnamed ItemOptimal node perturbation in linear perceptrons with uncertain eligibility traceModel-based reinforcement learning with dimension reductionPreference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithmImportance sampling in reinforcement learning with an estimated behavior policyHNS: hierarchical negative sampling for network representation learningBranes with brains: exploring string vacua with deep reinforcement learningRevisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximationDealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problemsCompatible natural gradient policy searchTD-regularized actor-critic methodsDeep reinforcement learning for inventory control: a roadmapReinforcement learning in the brainVarieties of Helmholtz machineRisk-averse policy optimization via risk-neutral policy optimizationUnnamed ItemUnnamed ItemAutonomous vehicle navigation using evolutionary reinforcement learningNatural actor-critic algorithmsReinforcement Learning in Sparse-Reward Environments With Hindsight Policy GradientsMeasurement error models: from nonparametric methods to deep neural networksNeural large neighborhood search for routing problemsMulti-agent reinforcement learning: a selective overview of theories and algorithmsUnnamed ItemUnnamed ItemOpportunities for reinforcement learning in stochastic dynamic vehicle routingEstimation and approximation bounds for gradient-based reinforcement learningHigh generalization performance structured self-attention model for knapsack problemTask-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision ProcessesHeavy-tails and randomized restarting beam search in goal-oriented neural sequence decoding



Cites Work


This page was built for publication: Simple statistical gradient-following algorithms for connectionist reinforcement learning