scientific article; zbMATH DE number 1753152
From MaRDI portal
Publication:4533362
zbMath0994.68119MaRDI QIDQ4533362
Jonathan Baxter, Bartlett, Peter L.
Publication date: 13 October 2002
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Related Items (36)
An incremental off-policy search in a model-free Markov decision process using a single sample path ⋮ The factored policy-gradient planner ⋮ A policy gradient method for semi-Markov decision processes with application to call admission control ⋮ A stochastic policy search model for matching behavior ⋮ Queueing Network Controls via Deep Reinforcement Learning ⋮ Simulation-based optimization of Markov decision processes: an empirical process theory approach ⋮ Synaptic dynamics: linear model and adaptation algorithm ⋮ Finding intrinsic rewards by embodied evolution and constrained reinforcement learning ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ Variance-constrained actor-critic algorithms for discounted and average reward MDPs ⋮ Smoothing policies and safe policy gradients ⋮ Variational actor-critic algorithms, ⋮ A novel online gait optimization approach for biped robots with point-feet ⋮ Geometry and convergence of natural policy gradient methods ⋮ Finding optimal memoryless policies of POMDPs under the expected average reward criterion ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Asymptotic bias of stochastic gradient search ⋮ Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies ⋮ Parameterized Markov decision process and its application to service rate control ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Risk-Constrained Reinforcement Learning with Percentile Risk Criteria ⋮ Hessian matrix distribution for Bayesian policy gradient reinforcement learning ⋮ Policy Gradient Approach of Event‐Based Optimization and Its Online Implementation ⋮ A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases ⋮ Basic ideas for event-based optimization of Markov systems ⋮ Model-based reinforcement learning with dimension reduction ⋮ Adaptive critic design with graph Laplacian for online learning control of nonlinear systems ⋮ On-line policy gradient estimation with multi-step sampling ⋮ Policy gradient in Lipschitz Markov decision processes ⋮ Transient-State Natural Gas Transmission in Gunbarrel Pipeline Networks ⋮ Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems ⋮ Risk-averse policy optimization via risk-neutral policy optimization ⋮ Natural actor-critic algorithms ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Estimation and approximation bounds for gradient-based reinforcement learning
This page was built for publication: