Near-optimal reinforcement learning in polynomial time

From MaRDI portal

Publication:1604817

Jump to:navigation, search

DOI10.1023/A:1017984413808zbMath1014.68071MaRDI QIDQ1604817

Michael Kearns, Satinder Pal Singh

Publication date: 8 July 2002

Published in: Machine Learning (Search for Journal in Brave)

zbMATH Keywords

Markov decision processes reinforcement learning

Mathematics Subject Classification ID

Computational learning theory (68Q32)

Related Items

Robust Control for Dynamical Systems with Non-Gaussian Noise via Formal Abstractions ⋮ Unnamed Item ⋮ Relational reinforcement learning with guided demonstrations ⋮ Reducing reinforcement learning to KWIK online regression ⋮ Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time ⋮ A framework for transforming specifications in reinforcement learning ⋮ Identity concealment games: how I learned to stop revealing and love the coincidences ⋮ Unnamed Item ⋮ Knows what it knows: a framework for self-aware learning ⋮ Unnamed Item ⋮ Bayesian optimistic Kullback-Leibler exploration ⋮ Certified reinforcement learning with logic guidance ⋮ Reinforcement Learning, Bit by Bit ⋮ Recent advances in reinforcement learning in finance ⋮ Specification-guided reinforcement learning ⋮ Online Regret Bounds for Markov Decision Processes with Deterministic Transitions ⋮ Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains ⋮ Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods ⋮ Efficient PAC learning for episodic tasks with acyclic state spaces ⋮ Reinforcement learning: exploration-exploitation dilemma in multi-agent foraging task ⋮ Unnamed Item ⋮ Deep reinforcement learning with temporal logics ⋮ An analysis of model-based interval estimation for Markov decision processes ⋮ Computer science and decision theory ⋮ Online regret bounds for Markov decision processes with deterministic transitions ⋮ Efficient exploration through active learning for value function approximation in reinforcement learning ⋮ An AO^* Based Exact Algorithm for the Canadian Traveler Problem ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ No regrets about no-regret ⋮ Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies ⋮ Inverse reinforcement learning in contextual MDPs ⋮ A near-optimal polynomial time algorithm for learning in certain classes of stochastic games ⋮ Unnamed Item

This page was built for publication: Near-optimal reinforcement learning in polynomial time

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1604817&oldid=13899934"