Filtered Poisson process bandit on a continuum
From MaRDI portal
Abstract: We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.
Recommendations
Cites work
- scientific article; zbMATH DE number 5485582 (Why is no real title available?)
- A general class of exponential inequalities for martingales and ratios
- Adaptive policies for perimeter surveillance problems
- Bandits With Heavy Tail
- Lipschitz bandits without the Lipschitz constant
- The Continuum-Armed Bandit Problem
- The Nonstochastic Multiarmed Bandit Problem
- \(X\)-armed bandits
Cited in
(2)
This page was built for publication: Filtered Poisson process bandit on a continuum
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2239901)