Batch policy learning in average reward Markov decision processes (Q2112817): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
ReferenceBot (talk | contribs)
Changed an Item
 
(5 intermediate revisions by 3 users not shown)
Property / author
 
Property / author: Susan A. Murphy / rank
Normal rank
 
Property / author
 
Property / author: Susan A. Murphy / rank
 
Normal rank
Property / describes a project that uses
 
Property / describes a project that uses: L-BFGS / rank
 
Normal rank
Property / describes a project that uses
 
Property / describes a project that uses: Spearmint / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning Algorithms for Markov Decision Processes with Average Cost / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4277836 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Double/debiased machine learning for treatment and structural parameters / rank
 
Normal rank
Property / cites work
 
Property / cites work: Doubly robust policy evaluation and optimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3093261 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Constructing dynamic treatment regimes over indefinite time horizons / rank
 
Normal rank
Property / cites work
 
Property / cites work: Model selection in reinforcement learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2834459 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4255598 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5148951 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Dynamic treatment regimes: technical challenges and applications / rank
 
Normal rank
Property / cites work
 
Property / cites work: 10.1162/1532443041827907 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the limited memory BFGS method for large scale optimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Statistical consistency and asymptotic normality for high-dimensional robust \(M\)-estimators / rank
 
Normal rank
Property / cites work
 
Property / cites work: Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5477863 / rank
 
Normal rank
Property / cites work
 
Property / cites work: The landscape of empirical risk for nonconvex losses / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3096132 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Marginal Mean Models for Dynamic Regimes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Semiparametric efficiency bounds / rank
 
Normal rank
Property / cites work
 
Property / cites work: Kernel-based reinforcement learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4315289 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Estimation of Regression Coefficients When Some Regressors Are Not Always Observed / rank
 
Normal rank
Property / cites work
 
Property / cites work: Support Vector Machines / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4626283 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Asymptotic Statistics / rank
 
Normal rank
Property / cites work
 
Property / cites work: Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Robust Method for Estimating Optimal Treatment Regimes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions / rank
 
Normal rank
Property / cites work
 
Property / cites work: New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4633064 / rank
 
Normal rank

Latest revision as of 06:44, 31 July 2024

scientific article
Language Label Description Also known as
English
Batch policy learning in average reward Markov decision processes
scientific article

    Statements

    Batch policy learning in average reward Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    12 January 2023
    0 references
    Markov decision process
    0 references
    average reward
    0 references
    policy optimization
    0 references
    doubly robust estimator
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers