scientific article; zbMATH DE number 7370615
From MaRDI portal
Publication:4999029
Jason D. Lee, Alekh Agarwal, Sham M. Kakade, Gaurav Mahajan
Publication date: 9 July 2021
Full work available at URL: https://arxiv.org/abs/1908.00261
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Related Items
A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic, Model-free design of stochastic LQR controller from a primal-dual optimization perspective, Scalable Reinforcement Learning for Multiagent Networked Systems, On linear and super-linear convergence of natural policy gradient algorithm, Softmax policy gradient methods can take exponential time to converge, Geometry and convergence of natural policy gradient methods, Recent advances in reinforcement learning in finance, Learning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent Chains, Multi-agent natural actor-critic reinforcement learning algorithms, Towards multi‐agent reinforcement learning‐driven over‐the‐counter market simulations, Reinforcement learning with dynamic convex risk measures
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Accelerated gradient methods for nonconvex nonlinear and stochastic programming
- Random design analysis of ridge regression
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Natural actor-critic algorithms
- A decision-theoretic generalization of on-line learning and an application to boosting
- Near-optimal reinforcement learning in polynomial time
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Cubic regularization of Newton method and its global performance
- 10.1162/153244303765208377
- Online Learning and Online Convex Optimization
- Online Markov Decision Processes
- Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality
- Functional Approximations and Dynamic Programming
- First-Order Methods in Optimization
- The Łojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems
- Prediction, Learning, and Games
- Understanding Machine Learning