Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning (Q5189863): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
Created claim: DBLP publication ID (P1635): journals/neco/MorimuraUYPD10, #quickstatements; #temporary_batch_1731483406851
 
(4 intermediate revisions by 4 users not shown)
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1162/neco.2009.12-08-922 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W1967459934 / rank
 
Normal rank
Property / Wikidata QID
 
Property / Wikidata QID: Q51782240 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4368722 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4257216 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Technical update: Least-squares temporal difference learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Linear least-squares algorithms for temporal difference learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: OnActor-Critic Algorithms / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4457477 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: How to optimize discrete-event systems from a single sample path by the score function method / rank
 
Normal rank
Property / cites work
 
Property / cites work: Average cost temporal-difference learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: On average versus discounted reward temporal-difference learning / rank
 
Normal rank
Property / DBLP publication ID
 
Property / DBLP publication ID: journals/neco/MorimuraUYPD10 / rank
 
Normal rank

Latest revision as of 09:26, 13 November 2024

scientific article; zbMATH DE number 5680295
Language Label Description Also known as
English
Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
scientific article; zbMATH DE number 5680295

    Statements

    Identifiers