An incremental off-policy search in a model-free Markov decision process using a single sample path (Q1621868): Difference between revisions

@@ Property / describes a project that uses @@
+PILCO
@@ Property / describes a project that uses: PILCO / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / OpenAlex ID @@
+W2963057120
@@ Property / OpenAlex ID: W2963057120 / rank @@
+Normal rank
@@ Property / arXiv ID @@
+.10287
@@ Property / arXiv ID: 1801.10287 / rank @@
+Normal rank
@@ Property / cites work @@
+Application of the cross-entropy method to the buffer allocation problem in a simulation-based environment
+Normal rank
@@ Property / cites work @@
+Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
+Normal rank
@@ Property / cites work @@
+Policy Iteration Based on Stochastic Factorization
+Normal rank
@@ Property / cites work @@
+Q4533362
@@ Property / cites work: Q4533362 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4368722
@@ Property / cites work: Q4368722 / rank @@
+Normal rank
@@ Property / cites work @@
+Adaptive aggregation methods for infinite horizon dynamic programming
+Normal rank
@@ Property / cites work @@
+Natural actor-critic algorithms
@@ Property / cites work: Natural actor-critic algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Q3527701
@@ Property / cites work: Q3527701 / rank @@
+Normal rank
@@ Property / cites work @@
+Simulation-based algorithms for Markov decision processes
+Normal rank
@@ Property / cites work @@
+Q2934010
@@ Property / cites work: Q2934010 / rank @@
+Normal rank
@@ Property / cites work @@
+Handbook of Markov decision processes. Methods and applications
+Normal rank
@@ Property / cites work @@
+Importance Sampling for Stochastic Simulations
@@ Property / cites work: Importance Sampling for Stochastic Simulations / rank @@
+Normal rank
@@ Property / cites work @@
+Q4422978
@@ Property / cites work: Q4422978 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4325914
@@ Property / cites work: Q4325914 / rank @@
+Normal rank
@@ Property / cites work @@
+A Model Reference Adaptive Search Method for Global Optimization
+Normal rank
@@ Property / cites work @@
+A Stochastic Approximation Framework for a Class of Randomized Optimization Algorithms
+Normal rank
@@ Property / cites work @@
+Q4576234
@@ Property / cites work: Q4576234 / rank @@
+Normal rank
@@ Property / cites work @@
+OnActor-Critic Algorithms
@@ Property / cites work: OnActor-Critic Algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+The cross-entropy method for continuous multi-extremal optimization
+Normal rank
@@ Property / cites work @@
+Optimal adaptive controllers for unknown Markov chains
+Normal rank
@@ Property / cites work @@
+.1162/1532443041827907
@@ Property / cites work: 10.1162/1532443041827907 / rank @@
+Normal rank
@@ Property / cites work @@
+Basis function adaptation in temporal difference reinforcement learning
+Normal rank
@@ Property / cites work @@
+Acceleration of Stochastic Approximation by Averaging
+Normal rank
@@ Property / cites work @@
+Q4315289
@@ Property / cites work: Q4315289 / rank @@
+Normal rank
@@ Property / cites work @@
+The cross-entropy method for combinatorial and continuous optimization
+Normal rank
@@ Property / cites work @@
+Cross-entropy and rare events for maximal cut and partition problems
+Normal rank
@@ Property / cites work @@
+Q4828558
@@ Property / cites work: Q4828558 / rank @@
+Normal rank
@@ Property / cites work @@
+Learning control of finite Markov chains with unknown transition probabilities
+Normal rank
@@ Property / cites work @@
+Learning control of finite Markov chains with an explicit trade-off between estimation and control
+Normal rank
@@ Property / cites work @@
+Q5477862
@@ Property / cites work: Q5477862 / rank @@
+Normal rank
@@ Property / cites work @@
+Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
+Normal rank
@@ Property / cites work @@
+An analysis of temporal-difference learning with function approximation
+Normal rank
@@ Property / cites work @@
+On diagonal dominance arguments for bounding \(\| A^{-1}\|_\infty\)
+Normal rank
@@ Property / cites work @@
+Parameter Estimation for ODEs Using a Cross-Entropy Approach
+Normal rank
@@ Property / cites work @@
+A note on entrywise perturbation theory for Markov chains
+Normal rank
@@ Property / cites work @@
+Least Squares Temporal Difference Methods: An Analysis under General Conditions
+Normal rank
@@ Property / cites work @@
+Model-based search for combinatorial optimization: A critical survey
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:1621868