OnActor-Critic Algorithms

From MaRDI portal

Publication:4443033

Jump to:navigation, search

DOI10.1137/S0363012901385691zbMath1049.93095OpenAlexW2009303086MaRDI QIDQ4443033

Vijay R. Konda, John N. Tsitsiklis

Publication date: 8 January 2004

Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1137/s0363012901385691

zbMATH Keywords

Markov decision processes stochastic approximation reinforcement learning actor-critic algorithms

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Stochastic learning and adaptive control (93E35)

Related Items

Convergence rate of linear two-time-scale stochastic approximation., An incremental off-policy search in a model-free Markov decision process using a single sample path, A constrained optimization perspective on actor-critic algorithms and application to network routing, A new learning algorithm for optimal stopping, Multiscale Q-learning with linear function approximation, Asynchronous stochastic approximation with differential inclusions, Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization, Reinforcement learning for a biped robot based on a CPG-actor-critic method, An adaptive actor-critic algorithm with multi-step simulated experiences for controlling nonholonomic mobile robots, Actor-critic algorithms for hierarchical Markov decision processes, Reinforcement learning based algorithms for average cost Markov decision processes, Efficient model-based reinforcement learning for approximate online optimal control, Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algo\-rithms, Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning, What is the value of the cross-sectional approach to deep reinforcement learning?, Simple and Optimal Methods for Stochastic Variational Inequalities, II: Markovian Noise and Policy Evaluation in Reinforcement Learning, From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming, Queueing Network Controls via Deep Reinforcement Learning, Neural circuits for learning context-dependent associations of stimuli, Stochastic optimization for real time service capacity allocation under random service demand, Immediate return preference emerged from a synaptic learning rule for return maximization, Finding intrinsic rewards by embodied evolution and constrained reinforcement learning, An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes, A Small Gain Analysis of Single Timescale Actor Critic, Toward multi-target self-organizing pursuit in a partially observable Markov game, Approximate Newton Policy Gradient Algorithms, Dynamic treatment regimes: technical challenges and applications, An Improved Unconstrained Approach for Bilevel Optimization, Variance-constrained actor-critic algorithms for discounted and average reward MDPs, Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning, Non-iterative generation of an optimal mesh for a blade passage using deep reinforcement learning, An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents, Reward-respecting subtasks for model-based reinforcement learning, Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning, Smoothing policies and safe policy gradients, Approximate stochastic annealing for online control of infinite horizon Markov decision processes, Softmax policy gradient methods can take exponential time to converge, On the sample complexity of actor-critic method for reinforcement learning with function approximation, Geometry and convergence of natural policy gradient methods, Tutorial on Amortized Optimization, Model-based reinforcement learning for approximate optimal regulation, Recent advances in reinforcement learning in finance, Multi-agent natural actor-critic reinforcement learning algorithms, An accelerated proximal algorithm for regularized nonconvex and nonsmooth bi-level optimization, Reinforcement learning algorithms with function approximation: recent advances and applications, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains, Two-timescale stochastic gradient descent in continuous time with applications to joint online parameter estimation and optimal sensor placement, Weak convergence of dynamical systems in two timescales, Asymptotic bias of stochastic gradient search, An online actor-critic algorithm with function approximation for constrained Markov decision processes, Stabilization of stochastic approximation by step size adaptation, Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains, Reinforcement learning for a class of continuous-time input constrained optimal control problems, Autonomous reinforcement learning with experience replay, Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies, Deep Reinforcement Learning: A State-of-the-Art Walkthrough, On Generalized Bellman Equations and Temporal-Difference Learning, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Unnamed Item, Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence, Risk-Constrained Reinforcement Learning with Percentile Risk Criteria, A tutorial on the cross-entropy method, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Sell or store? An ADP approach to marketing renewable energy, Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning, Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule, Linear stochastic approximation driven by slowly varying Markov chains, Real-time reinforcement learning by sequential actor-critics and experience replay, An actor-critic algorithm for constrained Markov decision processes, A sensitivity formula for risk-sensitive cost and the actor-critic algorithm, Control strategy of speed servo systems based on deep reinforcement learning, A Spiking Neural Network Model of an Actor-Critic Learning Agent, Dynamic programming and suboptimal control: a survey from ADP to MPC, Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling, Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning, A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs, TD-regularized actor-critic methods, Reinforcement learning in the brain, Natural actor-critic algorithms, Fundamental design principles for reinforcement learning algorithms, Mixed density methods for approximate dynamic programming, Multi-agent reinforcement learning: a selective overview of theories and algorithms, Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation, Unnamed Item, Performance optimization for a class of generalized stochastic Petri nets, Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks, Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities, Estimation and approximation bounds for gradient-based reinforcement learning, Actor-Critic Algorithms with Online Feature Adaptation

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4443033&oldid=18489317"