scientific article; zbMATH DE number 6253929

From MaRDI portal

Jump to:navigation, search

zbMath1280.68208MaRDI QIDQ5396665

Shin Ishii, Shin-ichi Maeda, Motoaki Kawanabe, Tsuyoshi Ueno

Publication date: 3 February 2014

Full work available at URL: http://www.jmlr.org/papers/v12/ueno11a.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

reinforcement learning estimating function TD learning model-free policy evaluation semiparametirc model

Mathematics Subject Classification ID

Asymptotic distribution theory in statistics (62E20) Nonparametric estimation (62G05) Learning and adaptive systems in artificial intelligence (68T05)

Related Items

Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning, Asymptotic analysis of value prediction by well-specified and misspecified models, On Generalized Bellman Equations and Temporal-Difference Learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5396665&oldid=20125425"