Batch policy learning in average reward Markov decision processes (Q2112817): Difference between revisions

From MaRDI portal
RedirectionBot (talk | contribs)
Changed an Item
Changed an Item
Property / describes a project that uses
 
Property / describes a project that uses: L-BFGS / rank
 
Normal rank

Revision as of 19:13, 28 February 2024

scientific article
Language Label Description Also known as
English
Batch policy learning in average reward Markov decision processes
scientific article

    Statements

    Batch policy learning in average reward Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    12 January 2023
    0 references
    Markov decision process
    0 references
    average reward
    0 references
    policy optimization
    0 references
    doubly robust estimator
    0 references

    Identifiers