scientific article; zbMATH DE number 1753152
From MaRDI portal
Publication:4533362
zbMath0994.68119MaRDI QIDQ4533362
Jonathan Baxter, Bartlett, Peter L.
Publication date: 13 October 2002
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Related Items
An incremental off-policy search in a model-free Markov decision process using a single sample path, The factored policy-gradient planner, A policy gradient method for semi-Markov decision processes with application to call admission control, A stochastic policy search model for matching behavior, Queueing Network Controls via Deep Reinforcement Learning, Simulation-based optimization of Markov decision processes: an empirical process theory approach, Synaptic dynamics: linear model and adaptation algorithm, Finding intrinsic rewards by embodied evolution and constrained reinforcement learning, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Variance-constrained actor-critic algorithms for discounted and average reward MDPs, Smoothing policies and safe policy gradients, Variational actor-critic algorithms,, A novel online gait optimization approach for biped robots with point-feet, Geometry and convergence of natural policy gradient methods, Finding optimal memoryless policies of POMDPs under the expected average reward criterion, Reinforcement learning algorithms with function approximation: recent advances and applications, Asymptotic bias of stochastic gradient search, Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies, Parameterized Markov decision process and its application to service rate control, Unnamed Item, Unnamed Item, Risk-Constrained Reinforcement Learning with Percentile Risk Criteria, Hessian matrix distribution for Bayesian policy gradient reinforcement learning, Policy Gradient Approach of Event‐Based Optimization and Its Online Implementation, A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases, Basic ideas for event-based optimization of Markov systems, Model-based reinforcement learning with dimension reduction, Adaptive critic design with graph Laplacian for online learning control of nonlinear systems, On-line policy gradient estimation with multi-step sampling, Policy gradient in Lipschitz Markov decision processes, Transient-State Natural Gas Transmission in Gunbarrel Pipeline Networks, Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems, Risk-averse policy optimization via risk-neutral policy optimization, Natural actor-critic algorithms, Multi-agent reinforcement learning: a selective overview of theories and algorithms, Estimation and approximation bounds for gradient-based reinforcement learning