Batch policy learning in average reward Markov decision processes (Q2112817)

From MaRDI portal
Revision as of 19:13, 28 February 2024 by SwMATHimport240215 (talk | contribs) (‎Changed an Item)
scientific article
Language Label Description Also known as
English
Batch policy learning in average reward Markov decision processes
scientific article

    Statements

    Batch policy learning in average reward Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    12 January 2023
    0 references
    Markov decision process
    0 references
    average reward
    0 references
    policy optimization
    0 references
    doubly robust estimator
    0 references

    Identifiers