scientific article
From MaRDI portal
Publication:2896090
zbMath1242.68229MaRDI QIDQ2896090
Peter Auer, Thomas Jaksch, Ronald Ortner
Publication date: 13 July 2012
Full work available at URL: http://www.jmlr.org/papers/v11/jaksch10a.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)
Related Items
Temporal concatenation for Markov decision processes ⋮ Extreme state aggregation beyond Markov decision processes ⋮ Unnamed Item ⋮ Lipschitzness is all you need to tame off-policy generative adversarial imitation learning ⋮ Adaptive aggregation for reinforcement learning in average reward Markov decision processes ⋮ Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management ⋮ Reducing reinforcement learning to KWIK online regression ⋮ Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model ⋮ Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time ⋮ Bayesian optimistic Kullback-Leibler exploration ⋮ Pessimistic value iteration for multi-task data sharing in offline reinforcement learning ⋮ Provably efficient reinforcement learning in decentralized general-sum Markov games ⋮ Settling the sample complexity of model-based offline reinforcement learning ⋮ Learning the distribution with largest mean: two bandit frameworks ⋮ Scale-free online learning ⋮ Regret bounds for restless Markov bandits ⋮ Near-optimal PAC bounds for discounted MDPs ⋮ Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations ⋮ Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Online regret bounds for Markov decision processes with deterministic transitions ⋮ Learning to Optimize via Information-Directed Sampling ⋮ Reinforcement Learning in Robust Markov Decision Processes ⋮ Robust MDPs with k-Rectangular Uncertainty ⋮ Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies ⋮ Scale-Free Algorithms for Online Linear Optimization ⋮ Online Learning in Markov Decision Processes with Continuous Actions ⋮ Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach ⋮ Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution ⋮ Unnamed Item ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ A Bandit-Learning Approach to Multifidelity Approximation
Uses Software