Near-optimal reinforcement learning in polynomial time
From MaRDI portal
Publication:1604817
DOI10.1023/A:1017984413808zbMath1014.68071MaRDI QIDQ1604817
Michael Kearns, Satinder Pal Singh
Publication date: 8 July 2002
Published in: Machine Learning (Search for Journal in Brave)
Related Items
Robust Control for Dynamical Systems with Non-Gaussian Noise via Formal Abstractions ⋮ Unnamed Item ⋮ Relational reinforcement learning with guided demonstrations ⋮ Reducing reinforcement learning to KWIK online regression ⋮ Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time ⋮ A framework for transforming specifications in reinforcement learning ⋮ Identity concealment games: how I learned to stop revealing and love the coincidences ⋮ Unnamed Item ⋮ Knows what it knows: a framework for self-aware learning ⋮ Unnamed Item ⋮ Bayesian optimistic Kullback-Leibler exploration ⋮ Certified reinforcement learning with logic guidance ⋮ Reinforcement Learning, Bit by Bit ⋮ Recent advances in reinforcement learning in finance ⋮ Specification-guided reinforcement learning ⋮ Online Regret Bounds for Markov Decision Processes with Deterministic Transitions ⋮ Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains ⋮ Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods ⋮ Efficient PAC learning for episodic tasks with acyclic state spaces ⋮ Reinforcement learning: exploration-exploitation dilemma in multi-agent foraging task ⋮ Unnamed Item ⋮ Deep reinforcement learning with temporal logics ⋮ An analysis of model-based interval estimation for Markov decision processes ⋮ Computer science and decision theory ⋮ Online regret bounds for Markov decision processes with deterministic transitions ⋮ Efficient exploration through active learning for value function approximation in reinforcement learning ⋮ An AO* Based Exact Algorithm for the Canadian Traveler Problem ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ No regrets about no-regret ⋮ Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies ⋮ Inverse reinforcement learning in contextual MDPs ⋮ A near-optimal polynomial time algorithm for learning in certain classes of stochastic games ⋮ Unnamed Item
This page was built for publication: Near-optimal reinforcement learning in polynomial time