Off-policy temporal difference learning with distribution adaptation in fast mixing chains (Q1797759)

scientific article; zbMATH DE number 6960085

Language	Label	Description	Also known as
default for all languages	No label defined
English	Off-policy temporal difference learning with distribution adaptation in fast mixing chains	scientific article; zbMATH DE number 6960085

Statements

instance of

scholarly article

0 references

title

Off-policy temporal difference learning with distribution adaptation in fast mixing chains (English)

0 references

0 references

0 references

0 references

22 October 2018

0 references

zbMATH Keywords

reinforcement learning

0 references

off-policy evaluation

0 references

least-squares temporal difference

0 references

covariate shift adaptation

0 references

mixing time

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1007/s00500-017-2490-1

0 references

cites work

Temporal Difference Methods for General Projected Equations

0 references

Q4257216

0 references

Projected equation methods for approximate solution of large linear systems

0 references

Linear least-squares algorithms for temporal difference learning

0 references

Semiparametric density estimation under a two-sample density ratio model

0 references

Policy evaluation with temporal differences: a survey and comparison

0 references

Off-policy learning with eligibility traces: a survey

0 references

A least-squares approach to direct importance estimation

0 references

Probabilistic graphical models.

0 references

Markov chains and mixing times. With a chapter on ``Coupling from the past'' by James G. Propp and David B. Wilson.

0 references

Inferences for case-control and semiparametric two-sample density ratio models

0 references

Q3311717

0 references

Q3976664

0 references

Direct importance estimation for covariate shift adaptation

0 references

An emphatic approach to the problem of off-policy temporal-difference learning

0 references

An analysis of temporal-difference learning with function approximation

0 references

model reduction for continuous-time Markovian jump systems with incomplete statistics of mode information

0 references

Mode-dependent nonrational output feedback control for continuous-time semi-Markovian jump systems with time-varying delay

0 references

Error bounds for approximations from projected linear equations

0 references

Identifiers

zbMATH Open document ID

1398.68436

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

10.1007/S00500-017-2490-1

0 references

Sitelinks

Mathematics(1 entry)

mardi Off-policy temporal difference learning with distribution adaptation in fast mixing chains