Off-policy temporal difference learning with distribution adaptation in fast mixing chains

From MaRDI portal
Publication:1797759