An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions (Q5380403): Difference between revisions

From MaRDI portal
Import240304020342 (talk | contribs)
Set profile property.
Import recommendations run Q6767936
 
(4 intermediate revisions by 4 users not shown)
Property / DOI
 
Property / DOI: 10.1162/NECO_a_00808 / rank
Normal rank
 
Property / cites work
 
Property / cites work: Online Markov Decision Processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2921693 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Logarithmic Regret Algorithms for Online Convex Optimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Efficient algorithms for online decision problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions / rank
 
Normal rank
Property / cites work
 
Property / cites work: Online Markov Decision Processes Under Bandit Feedback / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4626283 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Simple statistical gradient-following algorithms for connectionist reinforcement learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Markov Decision Processes with Arbitrary Reward Processes / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1162/neco_a_00808 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2225522132 / rank
 
Normal rank
Property / DBLP publication ID
 
Property / DBLP publication ID: journals/neco/MaZHS16 / rank
 
Normal rank
Property / DOI
 
Property / DOI: 10.1162/NECO_A_00808 / rank
 
Normal rank
Property / Recommended article
 
Property / Recommended article: Online Learning in Markov Decision Processes with Continuous Actions / rank
 
Normal rank
Property / Recommended article: Online Learning in Markov Decision Processes with Continuous Actions / qualifier
 
Similarity Score: 0.92699593
Amount0.92699593
Unit1
Property / Recommended article: Online Learning in Markov Decision Processes with Continuous Actions / qualifier
 
Property / Recommended article
 
Property / Recommended article: Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes / rank
 
Normal rank
Property / Recommended article: Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes / qualifier
 
Similarity Score: 0.91683835
Amount0.91683835
Unit1
Property / Recommended article: Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes / qualifier
 
Property / Recommended article
 
Property / Recommended article: Policy Gradient for Continuing Tasks in Discounted Markov Decision Processes / rank
 
Normal rank
Property / Recommended article: Policy Gradient for Continuing Tasks in Discounted Markov Decision Processes / qualifier
 
Similarity Score: 0.90851086
Amount0.90851086
Unit1
Property / Recommended article: Policy Gradient for Continuing Tasks in Discounted Markov Decision Processes / qualifier
 
Property / Recommended article
 
Property / Recommended article: A basic formula for online policy gradient algorithms / rank
 
Normal rank
Property / Recommended article: A basic formula for online policy gradient algorithms / qualifier
 
Similarity Score: 0.9016662
Amount0.9016662
Unit1
Property / Recommended article: A basic formula for online policy gradient algorithms / qualifier
 
Property / Recommended article
 
Property / Recommended article: An online actor-critic algorithm with function approximation for constrained Markov decision processes / rank
 
Normal rank
Property / Recommended article: An online actor-critic algorithm with function approximation for constrained Markov decision processes / qualifier
 
Similarity Score: 0.8997418
Amount0.8997418
Unit1
Property / Recommended article: An online actor-critic algorithm with function approximation for constrained Markov decision processes / qualifier
 
Property / Recommended article
 
Property / Recommended article: Q3093369 / rank
 
Normal rank
Property / Recommended article: Q3093369 / qualifier
 
Similarity Score: 0.8991422
Amount0.8991422
Unit1
Property / Recommended article: Q3093369 / qualifier
 
Property / Recommended article
 
Property / Recommended article: Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes / rank
 
Normal rank
Property / Recommended article: Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes / qualifier
 
Similarity Score: 0.89336705
Amount0.89336705
Unit1
Property / Recommended article: Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes / qualifier
 
Property / Recommended article
 
Property / Recommended article: Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives / rank
 
Normal rank
Property / Recommended article: Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives / qualifier
 
Similarity Score: 0.89125407
Amount0.89125407
Unit1
Property / Recommended article: Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives / qualifier
 

Latest revision as of 13:22, 4 April 2025

scientific article; zbMATH DE number 7062532
Language Label Description Also known as
English
An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
scientific article; zbMATH DE number 7062532

    Statements

    An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    4 June 2019
    0 references

    Identifiers