Off-policy temporal difference learning with distribution adaptation in fast mixing chains (Q1797759)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Off-policy temporal difference learning with distribution adaptation in fast mixing chains |
scientific article; zbMATH DE number 6960085
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | Off-policy temporal difference learning with distribution adaptation in fast mixing chains |
scientific article; zbMATH DE number 6960085 |
Statements
Off-policy temporal difference learning with distribution adaptation in fast mixing chains (English)
0 references
22 October 2018
0 references
reinforcement learning
0 references
off-policy evaluation
0 references
least-squares temporal difference
0 references
covariate shift adaptation
0 references
mixing time
0 references
0 references
0 references
0.7548161745071411
0 references
0.7503262162208557
0 references
0.7460692524909973
0 references
0.7428575754165649
0 references