Filtered Poisson process bandit on a continuum

DOI10.1016/J.EJOR.2021.03.033MaRDI QIDQ2239901zbMATH OpenOpenAlexFDO

Authors James A. Grant, Roberto Szechtman

Publication date 5 November 2021

Published in European Journal of Operational Research (Search for Journal in Brave)

Copyright license Creative Commons Attribution 4.0 International

Full work available at URL https://arxiv.org/abs/2007.09966

zbMATH Keywords

machine learning applied probability Poisson processes multi-armed bandit

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Sequential statistical analysis (62L10) Probabilistic games; gambling (91A60)

Abstract: We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

Recommendations

Cites work

Cited in

(2)

This page was built for publication: Filtered Poisson process bandit on a continuum

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2239901)