Pages that link to "Item:Q5477859"

The following pages link to (Q5477859):

Displaying 37 items.

Potential-based least-squares policy iteration for a parameterized feedback control system (Q289143) (← links)
Approximate dynamic programming for the dispatch of military medical evacuation assets (Q323422) (← links)
Perspectives of approximate dynamic programming (Q333093) (← links)
Batch mode reinforcement learning based on the synthesis of artificial trajectories (Q378762) (← links)
The optimal unbiased value estimator and its relation to LSTD, TD and MC (Q415609) (← links)
Asymptotic analysis of value prediction by well-specified and misspecified models (Q448322) (← links)
Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach (Q511735) (← links)
A two-level optimization model for elective surgery scheduling with downstream capacity constraints (Q666974) (← links)
Proximal algorithms and temporal difference methods for solving fixed point problems (Q721950) (← links)
Solving factored MDPs using non-homogeneous partitions (Q814475) (← links)
A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning (Q859737) (← links)
Reinforcement learning algorithms with function approximation: recent advances and applications (Q903601) (← links)
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path (Q1009248) (← links)
Projected equation methods for approximate solution of large linear systems (Q1012492) (← links)
Natural actor-critic algorithms (Q1049136) (← links)
An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method (Q1631797) (← links)
Approximate dynamic programming for missile defense interceptor fire control (Q1751900) (← links)
Off-policy temporal difference learning with distribution adaptation in fast mixing chains (Q1797759) (← links)
Hybrid least-squares algorithms for approximate policy evaluation (Q1959511) (← links)
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling (Q2051259) (← links)
Challenges of real-world reinforcement learning: definitions, benchmarks and analysis (Q2071388) (← links)
Improving defensive air battle management by solving a stochastic dynamic assignment problem via approximate dynamic programming (Q2103047) (← links)
Approximate dynamic programming for the military inventory routing problem (Q2173135) (← links)
A Q-learning predictive control scheme with guaranteed stability (Q2220029) (← links)
An approximate dynamic programming approach for comparing firing policies in a networked air defense environment (Q2297577) (← links)
Reinforcement learning for a biped robot based on a CPG-actor-critic method (Q2383520) (← links)
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning (Q2389624) (← links)
Dynamic portfolio choice: a simulation-and-regression approach (Q2402578) (← links)
Basis function adaptation in temporal difference reinforcement learning (Q2485935) (← links)
(Q2741536) (← links)
Convergence of the standard RLS method andUDUTfactorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming (Q2792939) (← links)
Approximate policy iteration: a survey and some new methods (Q2887629) (← links)
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications (Q2887630) (← links)
(Q5168869) (← links)
Dopamine Ramps Are a Consequence of Reward Prediction Errors (Q5378329) (← links)
Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage (Q5882386) (← links)
Recent advances in reinforcement learning in finance (Q6146668) (← links)