scientific article

From MaRDI portal
Publication:2896090

zbMath1242.68229MaRDI QIDQ2896090

Peter Auer, Thomas Jaksch, Ronald Ortner

Publication date: 13 July 2012

Full work available at URL: http://www.jmlr.org/papers/v11/jaksch10a.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.



Related Items

Temporal concatenation for Markov decision processesExtreme state aggregation beyond Markov decision processesUnnamed ItemLipschitzness is all you need to tame off-policy generative adversarial imitation learningAdaptive aggregation for reinforcement learning in average reward Markov decision processesLearning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory ManagementReducing reinforcement learning to KWIK online regressionMinimax PAC bounds on the sample complexity of reinforcement learning with a generative modelExplicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial timeBayesian optimistic Kullback-Leibler explorationPessimistic value iteration for multi-task data sharing in offline reinforcement learningProvably efficient reinforcement learning in decentralized general-sum Markov gamesSettling the sample complexity of model-based offline reinforcement learningLearning the distribution with largest mean: two bandit frameworksScale-free online learningRegret bounds for restless Markov banditsNear-optimal PAC bounds for discounted MDPsGlobally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point IterationsDynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable ItemsUnnamed ItemUnnamed ItemOnline regret bounds for Markov decision processes with deterministic transitionsLearning to Optimize via Information-Directed SamplingReinforcement Learning in Robust Markov Decision ProcessesRobust MDPs with k-Rectangular UncertaintyController exploitation-exploration reinforcement learning architecture for computing near-optimal policiesScale-Free Algorithms for Online Linear OptimizationOnline Learning in Markov Decision Processes with Continuous ActionsLearning Unknown Service Rates in Queues: A Multiarmed Bandit ApproachDynamic Pricing with Multiple Products and Partially Specified Demand DistributionUnnamed ItemMulti-agent reinforcement learning: a selective overview of theories and algorithmsA Bandit-Learning Approach to Multifidelity Approximation


Uses Software