The following pages link to (Q2934010):
Displaying 22 items.
- An incremental off-policy search in a model-free Markov decision process using a single sample path (Q1621868) (← links)
- An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method (Q1631797) (← links)
- Off-policy temporal difference learning with distribution adaptation in fast mixing chains (Q1797759) (← links)
- Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization (Q2089785) (← links)
- Multi-agent reinforcement learning: a selective overview of theories and algorithms (Q2094040) (← links)
- Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics (Q2112808) (← links)
- On Generalized Bellman Equations and Temporal-Difference Learning (Q3305109) (← links)
- Accelerating Stochastic Composition Optimization (Q4637024) (← links)
- MultiLevel Composite Stochastic Optimization via Nested Variance Reduction (Q4987278) (← links)
- A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation (Q5003727) (← links)
- Simple and Optimal Methods for Stochastic Variational Inequalities, II: Markovian Noise and Policy Evaluation in Reinforcement Learning (Q5081106) (← links)
- Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis (Q5162625) (← links)
- Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage (Q5882386) (← links)
- A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic (Q5883319) (← links)
- Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation (Q5885838) (← links)
- Hybrid SGD algorithms to solve stochastic composite optimization problems with application in sparse portfolio selection problems (Q6049347) (← links)
- Stochastic composition optimization of functions without Lipschitz continuous gradient (Q6108982) (← links)
- Gradient temporal-difference learning for off-policy evaluation using emphatic weightings (Q6146179) (← links)
- Multi-agent natural actor-critic reinforcement learning algorithms (Q6159507) (← links)
- Approximated multi-agent fitted Q iteration (Q6174070) (← links)
- Distributed entropy-regularized multi-agent reinforcement learning with policy consensus (Q6550247) (← links)
- A functional model method for nonconvex nonsmooth conditional stochastic optimization (Q6622742) (← links)