Off-policy temporal difference learning with distribution adaptation in fast mixing chains
From MaRDI portal
Recommendations
Cites work
- scientific article; zbMATH DE number 3841285 (Why is no real title available?)
- scientific article; zbMATH DE number 19232 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- A least-squares approach to direct importance estimation
- An analysis of temporal-difference learning with function approximation
- An emphatic approach to the problem of off-policy temporal-difference learning
- Direct importance estimation for covariate shift adaptation
- Error bounds for approximations from projected linear equations
- Inferences for case-control and semiparametric two-sample density ratio models
- Linear least-squares algorithms for temporal difference learning
- Markov chains and mixing times. With a chapter on ``Coupling from the past by James G. Propp and David B. Wilson.
- Mode-dependent nonrational output feedback control for continuous-time semi-Markovian jump systems with time-varying delay
- Off-policy learning with eligibility traces: a survey
- Policy evaluation with temporal differences: a survey and comparison
- Probabilistic graphical models.
- Projected equation methods for approximate solution of large linear systems
- Semiparametric density estimation under a two-sample density ratio model
- Temporal Difference Methods for General Projected Equations
- model reduction for continuous-time Markovian jump systems with incomplete statistics of mode information
Cited in
(2)
This page was built for publication: Off-policy temporal difference learning with distribution adaptation in fast mixing chains
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1797759)