Reinforcement learning with immediate rewards and linear hypotheses
From MaRDI portal
Publication:1762980
DOI10.1007/s00453-003-1038-1zbMath1082.68039OpenAlexW2057050711MaRDI QIDQ1762980
Philip M. Long, Alan W. Biermann, Naoki Abe
Publication date: 11 February 2005
Published in: Algorithmica (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s00453-003-1038-1
Decision theoryComputational learning theoryOnline learningOnline algorithmsReinforcement learningDialogue systemsImmediate rewards
Classification and discrimination; cluster analysis (statistical aspects) (62H30) Computational learning theory (68Q32) Modes of computation (nondeterministic, parallel, interactive, probabilistic, etc.) (68Q10)
Related Items
Discount Targeting in Online Social Networks Using Backpressure-Based Learning ⋮ Multi-armed bandits with censored consumption of resources ⋮ Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection ⋮ Multi-armed linear bandits with latent biases ⋮ New bounds on the price of bandit feedback for mistake-bounded online multiclass learning ⋮ Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit ⋮ Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability