Performance Loss Bounds for Approximate Value Iteration with State Aggregation

From MaRDI portal

Publication:5387976

Jump to:navigation, search

DOI10.1287/moor.1060.0188zbMath1278.90424OpenAlexW2134932665MaRDI QIDQ5387976

Benjamin van Roy

Publication date: 27 May 2008

Published in: Mathematics of Operations Research (Search for Journal in Brave)

Full work available at URL: https://semanticscholar.org/paper/da52cd530b661553b9abb1acefcfc16e45cd1b8b

zbMATH Keywords

state aggregation temporal-difference learning approximate value iteration

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Dynamic programming (90C39) Markov and semi-Markov decision processes (90C40)

Related Items

Approximate policy iteration: a survey and some new methods ⋮ Revenue management for operations with urgent orders ⋮ Approximate dynamic programming with state aggregation applied to UAV perimeter patrol ⋮ A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces ⋮ Adaptive aggregation for reinforcement learning in average reward Markov decision processes ⋮ State partitioning based linear program for stochastic dynamic programs: an invariance property ⋮ Continuity of cost in Borkar control topology and implications on discrete space and time approximations for controlled diffusions under several criteria ⋮ Approximate dynamic programming via direct search in the space of value function approximations ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5387976&oldid=20111681"