The Markovian price of information

DOI10.1007/978-3-030-17953-3_18MaRDI QIDQ2293091zbMATH OpenOpenAlexFDO

Authors Anupam Gupta, Haotian Jiang, Ziv Scully, Sahil Singla

Publication date 6 February 2020

Full work available at URL https://arxiv.org/abs/1902.07856

zbMATH Keywords

Gittins index multi-armed bandits probing algorithms

Mathematics Subject Classification ID

Combinatorial optimization (90C27) Markov and semi-Markov decision processes (90C40)

Abstract: Suppose there are

n

Markov chains and we need to pay a per-step emph{price} to advance them. The "destination" states of the Markov chains contain rewards; however, we can only get rewards for a subset of them that satisfy a combinatorial constraint, e.g., at most

k

of them, or they are acyclic in an underlying graph. What strategy should we choose to advance the Markov chains if our goal is to maximize the total reward emph{minus} the total price that we pay? In this paper we introduce a Markovian price of information model to capture settings such as the above, where the input parameters of a combinatorial optimization problem are given via Markov chains. We design optimal/approximation algorithms that jointly optimize the value of the combinatorial problem and the total paid price. We also study emph{robustness} of our algorithms to the distribution parameters and how to handle the emph{commitment} constraint. Our work brings together two classical lines of investigation: getting optimal strategies for Markovian multi-armed bandits, and getting exact and approximation algorithms for discrete optimization problems using combinatorial as well as linear-programming relaxation ideas.

Recommendations

Cited in

(8)

This page was built for publication: The Markovian price of information

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2293091)