On the possibility of learning in reactive environments with arbitrary dependence

DOI10.1016/J.TCS.2008.06.039zbMATH Open1158.68039arXiv0810.5636OpenAlexW1969028245WikidataQ58012401 ScholiaQ58012401MaRDI QIDQ950202FDOQ950202

Authors: Daniil Ryabko, Marcus Hutter

Publication date: 22 October 2008

Published in: Theoretical Computer Science (Search for Journal in Brave)

Abstract: We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions, i.e. environments more general than (PO)MDPs. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.

Full work available at URL: https://arxiv.org/abs/0810.5636

Recommendations

zbMATH Keywords

Markov decision processes reinforcement learning asymptotic average value self-optimizing policies

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)

Cites Work

Cited In (7)

Uses Software

R-MAX

This page was built for publication: On the possibility of learning in reactive environments with arbitrary dependence

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q950202)