Off-policy temporal difference learning with distribution adaptation in fast mixing chains
From MaRDI portal
Publication:1797759
DOI10.1007/s00500-017-2490-1zbMath1398.68436OpenAlexW2581667043MaRDI QIDQ1797759
Publication date: 22 October 2018
Published in: Soft Computing (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s00500-017-2490-1
reinforcement learningmixing timecovariate shift adaptationleast-squares temporal differenceoff-policy evaluation
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Direct importance estimation for covariate shift adaptation
- Projected equation methods for approximate solution of large linear systems
- Semiparametric density estimation under a two-sample density ratio model
- Mode-dependent nonrational output feedback control for continuous-time semi-Markovian jump systems with time-varying delay
- Error Bounds for Approximations from Projected Linear Equations
- Inferences for case-control and semiparametric two-sample density ratio models
- An analysis of temporal-difference learning with function approximation
- model reduction for continuous-time Markovian jump systems with incomplete statistics of mode information
- Temporal Difference Methods for General Projected Equations