The following pages link to OnActor-Critic Algorithms (Q4443033):
Displaying 3 items.
- Immediate return preference emerged from a synaptic learning rule for return maximization (Q889365) (← links)
- Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning (Q5189863) (← links)
- Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule (Q5440969) (← links)