Q4999029 (Q4999029): Difference between revisions

From MaRDI portal
Import240304020342 (talk | contribs)
Set profile property.
ReferenceBot (talk | contribs)
Changed an Item
 
Property / cites work
 
Property / cites work: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path / rank
 
Normal rank
Property / cites work
 
Property / cites work: Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5405224 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4387224 / rank
 
Normal rank
Property / cites work
 
Property / cites work: First-Order Methods in Optimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Functional Approximations and Dynamic Programming / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4257216 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Natural actor-critic algorithms / rank
 
Normal rank
Property / cites work
 
Property / cites work: The Łojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems / rank
 
Normal rank
Property / cites work
 
Property / cites work: 10.1162/153244303765208377 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Prediction, Learning, and Games / rank
 
Normal rank
Property / cites work
 
Property / cites work: Online Markov Decision Processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: A decision-theoretic generalization of on-line learning and an application to boosting / rank
 
Normal rank
Property / cites work
 
Property / cites work: Accelerated gradient methods for nonconvex nonlinear and stochastic programming / rank
 
Normal rank
Property / cites work
 
Property / cites work: Random design analysis of ridge regression / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5791470 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Near-optimal reinforcement learning in polynomial time / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2810787 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3967358 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Cubic regularization of Newton method and its global performance / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5744816 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Understanding Machine Learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Online Learning and Online Convex Optimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Simple statistical gradient-following algorithms for connectionist reinforcement learning / rank
 
Normal rank

Latest revision as of 04:17, 26 July 2024

scientific article; zbMATH DE number 7370615
Language Label Description Also known as
English
No label defined
scientific article; zbMATH DE number 7370615

    Statements

    0 references
    0 references
    0 references
    0 references
    9 July 2021
    0 references
    policy gradient
    0 references
    reinforcement learning
    0 references

    Identifiers