Kernel-based reinforcement learning

From MaRDI portal
Publication:1604813

DOI10.1023/A:1017928328829zbMath1014.68069MaRDI QIDQ1604813

Dirk Ormoneit, Śaunak Sen

Publication date: 8 July 2002

Published in: Machine Learning (Search for Journal in Brave)




Related Items

A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications, Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer, An algorithmic approach to optimal asset liquidation problems, Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm, Algorithms for Optimal Control of Stochastic Switching Systems, Low-discrepancy sampling for approximate dynamic programming with local approximators, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials, Hybrid MDP based integrated hierarchical Q-learning, Batch mode reinforcement learning based on the synthesis of artificial trajectories, Efficient algorithms of pathwise dynamic programming for decision optimization in mining operations, Multi-agent DRL-based data-driven approach for PEVs charging/discharging scheduling in smart grid, Deep reinforcement trading with predictable returns, Design of experiments for the calibration of history-dependent models via deep reinforcement learning and an enhanced Kalman filter, From Reinforcement Learning to Deep Reinforcement Learning: An Overview, Approximated multi-agent fitted Q iteration, Reinforcement learning algorithms with function approximation: recent advances and applications, Graph kernels and Gaussian processes for relational reinforcement learning, Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains, Shape constraints in economics and operations research, Towards Min Max Generalization in Reinforcement Learning, Adaptive critic design with graph Laplacian for online learning control of nonlinear systems, An Approximate Dynamic Programming Algorithm for Monotone Value Functions, Fitted Q-iteration by functional networks for control problems, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology, and Policy, Unnamed Item, Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis, Batch policy learning in average reward Markov decision processes