A survey on online learning methods: Thompson sampling and others

From MaRDI portal

Publication:3176079

Jump to:navigation, search

DOI10.15960/J.CNKI.ISSN.1007-6093.2017.04.006MaRDI QIDQ3176079zbMATH OpenFDO

Authors Simai He, Yu Jia Jin, Hua Wang, Dongdong Ge

Publication date 18 July 2018

zbMATH Keywords

multi-armed bandit online learning upper confidence bound online convex optimization Thompson sampling

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Convex programming (90C25) Sampling theory, sample surveys (62D05) Stopping times; optimal stopping problems; gambling theory (60G40)

Recommendations

Cited in

(1)

A Survey of Preference-Based Online Learning with Bandit Algorithms

This page was built for publication: A survey on online learning methods: Thompson sampling and others

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3176079)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:3176079&oldid=16411759"