The following pages link to (Q4533363):
Displaying 12 items.
- Active inference and agency: optimal control without cost functions (Q353847) (← links)
- Finding optimal memoryless policies of POMDPs under the expected average reward criterion (Q418072) (← links)
- Analysis and improvement of policy gradient estimation (Q448295) (← links)
- The factored policy-gradient planner (Q835832) (← links)
- Basic ideas for event-based optimization of Markov systems (Q1773104) (← links)
- Structured prediction with reinforcement learning (Q1959534) (← links)
- Does lifelong learning affect mobile robot evolution? (Q2091556) (← links)
- A tutorial on the cross-entropy method (Q2485925) (← links)
- ARMed SPHINCS (Q2798787) (← links)
- ARES: Adaptive Receding-Horizon Synthesis of Optimal Plans (Q3303936) (← links)
- Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies (Q5139670) (← links)
- On-line policy gradient estimation with multi-step sampling (Q5962027) (← links)