Reinforcement learning with immediate rewards and linear hypotheses

From MaRDI portal

Publication:1762980

Jump to:navigation, search

DOI10.1007/s00453-003-1038-1zbMath1082.68039OpenAlexW2057050711MaRDI QIDQ1762980

Philip M. Long, Alan W. Biermann, Naoki Abe

Publication date: 11 February 2005

Published in: Algorithmica (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1007/s00453-003-1038-1

zbMATH Keywords

Decision theory Computational learning theory Online learning Online algorithms Reinforcement learning Dialogue systems Immediate rewards

Mathematics Subject Classification ID

Classification and discrimination; cluster analysis (statistical aspects) (62H30) Computational learning theory (68Q32) Modes of computation (nondeterministic, parallel, interactive, probabilistic, etc.) (68Q10)

Related Items

Discount Targeting in Online Social Networks Using Backpressure-Based Learning ⋮ Multi-armed bandits with censored consumption of resources ⋮ Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection ⋮ Multi-armed linear bandits with latent biases ⋮ New bounds on the price of bandit feedback for mistake-bounded online multiclass learning ⋮ Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit ⋮ Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1762980&oldid=14103899"