scientific article; zbMATH DE number 7370615
From MaRDI portal
Publication:4999029
Jason D. Lee, Alekh Agarwal, Sham M. Kakade, Gaurav Mahajan
Publication date: 9 July 2021
Full work available at URL: https://arxiv.org/abs/1908.00261
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Related Items (11)
A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic ⋮ Model-free design of stochastic LQR controller from a primal-dual optimization perspective ⋮ Scalable Reinforcement Learning for Multiagent Networked Systems ⋮ On linear and super-linear convergence of natural policy gradient algorithm ⋮ Softmax policy gradient methods can take exponential time to converge ⋮ Geometry and convergence of natural policy gradient methods ⋮ Recent advances in reinforcement learning in finance ⋮ Learning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent Chains ⋮ Multi-agent natural actor-critic reinforcement learning algorithms ⋮ Towards multi‐agent reinforcement learning‐driven over‐the‐counter market simulations ⋮ Reinforcement learning with dynamic convex risk measures
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Accelerated gradient methods for nonconvex nonlinear and stochastic programming
- Random design analysis of ridge regression
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Natural actor-critic algorithms
- A decision-theoretic generalization of on-line learning and an application to boosting
- Near-optimal reinforcement learning in polynomial time
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Cubic regularization of Newton method and its global performance
- 10.1162/153244303765208377
- Online Learning and Online Convex Optimization
- Online Markov Decision Processes
- Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality
- Functional Approximations and Dynamic Programming
- First-Order Methods in Optimization
- The Łojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems
- Prediction, Learning, and Games
- Understanding Machine Learning
This page was built for publication: