scientific article; zbMATH DE number 1753152

From MaRDI portal
Revision as of 09:47, 7 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:4533362

zbMath0994.68119MaRDI QIDQ4533362

Jonathan Baxter, Bartlett, Peter L.

Publication date: 13 October 2002


Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.



Related Items (36)

An incremental off-policy search in a model-free Markov decision process using a single sample pathThe factored policy-gradient plannerA policy gradient method for semi-Markov decision processes with application to call admission controlA stochastic policy search model for matching behaviorQueueing Network Controls via Deep Reinforcement LearningSimulation-based optimization of Markov decision processes: an empirical process theory approachSynaptic dynamics: linear model and adaptation algorithmFinding intrinsic rewards by embodied evolution and constrained reinforcement learningRisk-Sensitive Reinforcement Learning via Policy Gradient SearchVariance-constrained actor-critic algorithms for discounted and average reward MDPsSmoothing policies and safe policy gradientsVariational actor-critic algorithms,A novel online gait optimization approach for biped robots with point-feetGeometry and convergence of natural policy gradient methodsFinding optimal memoryless policies of POMDPs under the expected average reward criterionReinforcement learning algorithms with function approximation: recent advances and applicationsAsymptotic bias of stochastic gradient searchGlobal Convergence of Policy Gradient Methods to (Almost) Locally Optimal PoliciesParameterized Markov decision process and its application to service rate controlUnnamed ItemUnnamed ItemRisk-Constrained Reinforcement Learning with Percentile Risk CriteriaHessian matrix distribution for Bayesian policy gradient reinforcement learningPolicy Gradient Approach of Event‐Based Optimization and Its Online ImplementationA unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain casesBasic ideas for event-based optimization of Markov systemsModel-based reinforcement learning with dimension reductionAdaptive critic design with graph Laplacian for online learning control of nonlinear systemsOn-line policy gradient estimation with multi-step samplingPolicy gradient in Lipschitz Markov decision processesTransient-State Natural Gas Transmission in Gunbarrel Pipeline NetworksDealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problemsRisk-averse policy optimization via risk-neutral policy optimizationNatural actor-critic algorithmsMulti-agent reinforcement learning: a selective overview of theories and algorithmsEstimation and approximation bounds for gradient-based reinforcement learning







This page was built for publication: