On the possibility of learning in reactive environments with arbitrary dependence

From MaRDI portal
Publication:950202

DOI10.1016/J.TCS.2008.06.039zbMATH Open1158.68039arXiv0810.5636OpenAlexW1969028245WikidataQ58012401 ScholiaQ58012401MaRDI QIDQ950202FDOQ950202

Daniil Ryabko, Marcus Hutter

Publication date: 22 October 2008

Published in: Theoretical Computer Science (Search for Journal in Brave)

Abstract: We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions, i.e. environments more general than (PO)MDPs. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.


Full work available at URL: https://arxiv.org/abs/0810.5636





Cites Work


Cited In (4)

Uses Software


   Recommendations





This page was built for publication: On the possibility of learning in reactive environments with arbitrary dependence

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q950202)