Near-optimal reinforcement learning in polynomial time

From MaRDI portal
Publication:1604817

DOI10.1023/A:1017984413808zbMath1014.68071MaRDI QIDQ1604817

Michael Kearns, Satinder Pal Singh

Publication date: 8 July 2002

Published in: Machine Learning (Search for Journal in Brave)




Related Items

Robust Control for Dynamical Systems with Non-Gaussian Noise via Formal AbstractionsUnnamed ItemRelational reinforcement learning with guided demonstrationsReducing reinforcement learning to KWIK online regressionExplicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial timeA framework for transforming specifications in reinforcement learningIdentity concealment games: how I learned to stop revealing and love the coincidencesUnnamed ItemKnows what it knows: a framework for self-aware learningUnnamed ItemBayesian optimistic Kullback-Leibler explorationCertified reinforcement learning with logic guidanceReinforcement Learning, Bit by BitRecent advances in reinforcement learning in financeSpecification-guided reinforcement learningOnline Regret Bounds for Markov Decision Processes with Deterministic TransitionsAdaptive-resolution reinforcement learning with polynomial exploration in deterministic domainsSolving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning MethodsEfficient PAC learning for episodic tasks with acyclic state spacesReinforcement learning: exploration-exploitation dilemma in multi-agent foraging taskUnnamed ItemDeep reinforcement learning with temporal logicsAn analysis of model-based interval estimation for Markov decision processesComputer science and decision theoryOnline regret bounds for Markov decision processes with deterministic transitionsEfficient exploration through active learning for value function approximation in reinforcement learningAn AO* Based Exact Algorithm for the Canadian Traveler ProblemBayesian Exploration for Approximate Dynamic ProgrammingNo regrets about no-regretController exploitation-exploration reinforcement learning architecture for computing near-optimal policiesInverse reinforcement learning in contextual MDPsA near-optimal polynomial time algorithm for learning in certain classes of stochastic gamesUnnamed Item




This page was built for publication: Near-optimal reinforcement learning in polynomial time