An incremental off-policy search in a model-free Markov decision process using a single sample path (Q1621868): Difference between revisions

From MaRDI portal
Import240304020342 (talk | contribs)
Set profile property.
ReferenceBot (talk | contribs)
Changed an Item
 
(2 intermediate revisions by 2 users not shown)
Property / OpenAlex ID
 
Property / OpenAlex ID: W2963057120 / rank
 
Normal rank
Property / arXiv ID
 
Property / arXiv ID: 1801.10287 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Application of the cross-entropy method to the buffer allocation problem in a simulation-based environment / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path / rank
 
Normal rank
Property / cites work
 
Property / cites work: Policy Iteration Based on Stochastic Factorization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4533362 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4368722 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Adaptive aggregation methods for infinite horizon dynamic programming / rank
 
Normal rank
Property / cites work
 
Property / cites work: Natural actor-critic algorithms / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3527701 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Simulation-based algorithms for Markov decision processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2934010 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Handbook of Markov decision processes. Methods and applications / rank
 
Normal rank
Property / cites work
 
Property / cites work: Importance Sampling for Stochastic Simulations / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4422978 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4325914 / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Model Reference Adaptive Search Method for Global Optimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Stochastic Approximation Framework for a Class of Randomized Optimization Algorithms / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4576234 / rank
 
Normal rank
Property / cites work
 
Property / cites work: OnActor-Critic Algorithms / rank
 
Normal rank
Property / cites work
 
Property / cites work: The cross-entropy method for continuous multi-extremal optimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Optimal adaptive controllers for unknown Markov chains / rank
 
Normal rank
Property / cites work
 
Property / cites work: 10.1162/1532443041827907 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Basis function adaptation in temporal difference reinforcement learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Acceleration of Stochastic Approximation by Averaging / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4315289 / rank
 
Normal rank
Property / cites work
 
Property / cites work: The cross-entropy method for combinatorial and continuous optimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Cross-entropy and rare events for maximal cut and partition problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4828558 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning control of finite Markov chains with unknown transition probabilities / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning control of finite Markov chains with an explicit trade-off between estimation and control / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5477862 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation / rank
 
Normal rank
Property / cites work
 
Property / cites work: An analysis of temporal-difference learning with function approximation / rank
 
Normal rank
Property / cites work
 
Property / cites work: On diagonal dominance arguments for bounding \(\| A^{-1}\|_\infty\) / rank
 
Normal rank
Property / cites work
 
Property / cites work: Parameter Estimation for ODEs Using a Cross-Entropy Approach / rank
 
Normal rank
Property / cites work
 
Property / cites work: A note on entrywise perturbation theory for Markov chains / rank
 
Normal rank
Property / cites work
 
Property / cites work: Least Squares Temporal Difference Methods: An Analysis under General Conditions / rank
 
Normal rank
Property / cites work
 
Property / cites work: Model-based search for combinatorial optimization: A critical survey / rank
 
Normal rank

Latest revision as of 07:30, 17 July 2024

scientific article
Language Label Description Also known as
English
An incremental off-policy search in a model-free Markov decision process using a single sample path
scientific article

    Statements

    An incremental off-policy search in a model-free Markov decision process using a single sample path (English)
    0 references
    0 references
    0 references
    12 November 2018
    0 references
    Markov decision process
    0 references
    off-policy prediction
    0 references
    control problem
    0 references
    stochastic approximation method
    0 references
    cross entropy method
    0 references
    linear function approximation
    0 references
    ODE method
    0 references
    global optimization
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers