Sample-efficient reinforcement learning from human feedback via information-directed sampling

From MaRDI portal

Jump to:navigation, search

DOI10.1109/TIT.2025.3598296MaRDI QIDQ6921518zbMATH OpenFDO

Authors Han Qi, Haochen Yang, Qiaosheng Zhang, Zhuoran Yang

Publication date 6 October 2025

Published in IEEE Transactions on Information Theory (Search for Journal in Brave)

zbMATH Keywords

information-directed sampling reinforcement learning from human feedback

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Information theory (general) (94A15)

This page was built for publication: Sample-efficient reinforcement learning from human feedback via information-directed sampling

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6921518)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Sample-efficient_reinforcement_learning_from_human_feedback_via_information-directed_sampling&oldid=75725209"