Pages that link to "Item:Q4943730"
From MaRDI portal
The following pages link to The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning (Q4943730):
Displaying 50 items.
- Multiscale Q-learning with linear function approximation (Q312650) (← links)
- Learning to control a structured-prediction decoder for detection of HTTP-layer DDoS attackers (Q331696) (← links)
- Oja's algorithm for graph clustering, Markov spectral decomposition, and risk sensitive control (Q361011) (← links)
- Stochastic approximation with long range dependent and heavy tailed noise (Q383264) (← links)
- An online actor-critic algorithm with function approximation for constrained Markov decision processes (Q438776) (← links)
- On stochastic gradient and subgradient methods with adaptive steplength sequences (Q445032) (← links)
- Stabilization of stochastic approximation by step size adaptation (Q450652) (← links)
- Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization (Q523576) (← links)
- The Borkar-Meyn theorem for asynchronous stochastic approximations (Q553371) (← links)
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes (Q616967) (← links)
- Charge-based control of DiffServ-like queues (Q705183) (← links)
- A new learning algorithm for optimal stopping (Q839001) (← links)
- Cooperative dynamics and Wardrop equilibria (Q1004090) (← links)
- Natural actor-critic algorithms (Q1049136) (← links)
- Reinforcement learning for long-run average cost. (Q1427588) (← links)
- Stability of annealing schemes and related processes (Q1589565) (← links)
- An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method (Q1631797) (← links)
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs (Q1689603) (← links)
- Asymptotic bias of stochastic gradient search (Q1704136) (← links)
- Approachability in Stackelberg stochastic games with vector costs (Q1707454) (← links)
- Q-learning for Markov decision processes with a satisfiability criterion (Q1749413) (← links)
- Popularity signals in trial-offer markets with social influence and position bias (Q1754150) (← links)
- Error bounds for constant step-size \(Q\)-learning (Q1932736) (← links)
- Convergence and convergence rate of stochastic gradient search in the case of multiple and non-isolated extrema (Q2018557) (← links)
- On the convergence of stochastic approximations under a subgeometric ergodic Markov dynamic (Q2044347) (← links)
- A stochastic primal-dual method for optimization with conditional value at risk constraints (Q2046691) (← links)
- Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling (Q2051259) (← links)
- Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation (Q2070010) (← links)
- An ODE method to prove the geometric convergence of adaptive stochastic algorithms (Q2074991) (← links)
- What may lie ahead in reinforcement learning (Q2094025) (← links)
- Fundamental design principles for reinforcement learning algorithms (Q2094028) (← links)
- Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning (Q2097782) (← links)
- A sojourn-based approach to semi-Markov reinforcement learning (Q2149523) (← links)
- Simultaneous perturbation Newton algorithms for simulation optimization (Q2260692) (← links)
- Non-asymptotic error bounds for constant stepsize stochastic approximation for tracking mobile agents (Q2334631) (← links)
- An information-theoretic analysis of return maximization in reinforcement learning (Q2375396) (← links)
- Online calibrated forecasts: memory efficiency versus universality for learning in games (Q2384142) (← links)
- An adaptive optimization scheme with satisfactory transient performance (Q2390563) (← links)
- A stability criterion for two timescale stochastic approximation schemes (Q2409333) (← links)
- Avoidance of traps in stochastic approximation (Q2503522) (← links)
- Linear stochastic approximation driven by slowly varying Markov chains (Q2503529) (← links)
- Boundedness of iterates in \(Q\)-learning (Q2504669) (← links)
- Multi-armed bandits based on a variant of simulated annealing (Q2520136) (← links)
- Event-driven stochastic approximation (Q2520142) (← links)
- Reinforcement learning based algorithms for average cost Markov decision processes (Q2643632) (← links)
- Two-timescale stochastic gradient descent in continuous time with applications to joint online parameter estimation and optimal sensor placement (Q2692526) (← links)
- Empirical Dynamic Programming (Q2806811) (← links)
- Nonlinear Gossip (Q2813309) (← links)
- Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals (Q3297666) (← links)
- Stochastic Recursive Inclusions in Two Timescales with Nonadditive Iterate-Dependent Markov Noise (Q3387930) (← links)