Batch policy learning in average reward Markov decision processes (Q2112817)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Batch policy learning in average reward Markov decision processes
scientific article

    Statements

    Batch policy learning in average reward Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    12 January 2023
    0 references
    0 references
    Markov decision process
    0 references
    average reward
    0 references
    policy optimization
    0 references
    doubly robust estimator
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references