A Structured Multiarmed Bandit Problem and the Greedy Policy

From MaRDI portal

Publication:4974829

Jump to:navigation, search

DOI10.1109/TAC.2009.2031725zbMath1367.90115OpenAlexW2115293355MaRDI QIDQ4974829

Paat Rusmevichientong, Adam J. Mersereau, John N. Tsitsiklis

Publication date: 8 August 2017

Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1109/tac.2009.2031725

Mathematics Subject Classification ID

Applications of mathematical programming (90C90) Markov and semi-Markov decision processes (90C40)

Related Items (5)

A linear response bandit problem ⋮ Response-adaptive designs for clinical trials: simultaneous learning from multiple patients ⋮ Bayesian policy reuse ⋮ Learning in Combinatorial Optimization: What and How to Explore ⋮ Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives

This page was built for publication: A Structured Multiarmed Bandit Problem and the Greedy Policy

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4974829&oldid=19411950"