The following pages link to OnActor-Critic Algorithms (Q4443033):
Displaying 50 items.
- A constrained optimization perspective on actor-critic algorithms and application to network routing (Q286519) (← links)
- Multiscale Q-learning with linear function approximation (Q312650) (← links)
- Efficient model-based reinforcement learning for approximate online optimal control (Q340682) (← links)
- Dynamic treatment regimes: technical challenges and applications (Q405345) (← links)
- An online actor-critic algorithm with function approximation for constrained Markov decision processes (Q438776) (← links)
- Stabilization of stochastic approximation by step size adaptation (Q450652) (← links)
- Autonomous reinforcement learning with experience replay (Q461126) (← links)
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes (Q616967) (← links)
- A new learning algorithm for optimal stopping (Q839001) (← links)
- An adaptive actor-critic algorithm with multi-step simulated experiences for controlling nonholonomic mobile robots (Q855223) (← links)
- Actor-critic algorithms for hierarchical Markov decision processes (Q856510) (← links)
- Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algo\-rithms (Q862224) (← links)
- Tutorial series on brain-inspired computing. IV: Reinforcement learning: machine learning and natural learning (Q867508) (← links)
- Immediate return preference emerged from a synaptic learning rule for return maximization (Q889365) (← links)
- Model-based reinforcement learning for approximate optimal regulation (Q899267) (← links)
- Reinforcement learning algorithms with function approximation: recent advances and applications (Q903601) (← links)
- Reinforcement learning in the brain (Q1042310) (← links)
- Natural actor-critic algorithms (Q1049136) (← links)
- Estimation and approximation bounds for gradient-based reinforcement learning (Q1604222) (← links)
- An incremental off-policy search in a model-free Markov decision process using a single sample path (Q1621868) (← links)
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs (Q1689603) (← links)
- Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains (Q1699932) (← links)
- Asymptotic bias of stochastic gradient search (Q1704136) (← links)
- Reinforcement learning for a class of continuous-time input constrained optimal control problems (Q1716659) (← links)
- Real-time reinforcement learning by sequential actor-critics and experience replay (Q1784532) (← links)
- Convergence rate of linear two-time-scale stochastic approximation. (Q1879892) (← links)
- Stochastic optimization for real time service capacity allocation under random service demand (Q1931638) (← links)
- Finding intrinsic rewards by embodied evolution and constrained reinforcement learning (Q1932114) (← links)
- Approximate stochastic annealing for online control of infinite horizon Markov decision processes (Q1937498) (← links)
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm (Q1945130) (← links)
- Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains (Q1959632) (← links)
- Sell or store? An ADP approach to marketing renewable energy (Q2011830) (← links)
- Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling (Q2051259) (← links)
- Fundamental design principles for reinforcement learning algorithms (Q2094028) (← links)
- Mixed density methods for approximate dynamic programming (Q2094030) (← links)
- Multi-agent reinforcement learning: a selective overview of theories and algorithms (Q2094040) (← links)
- Neural circuits for learning context-dependent associations of stimuli (Q2182880) (← links)
- Weak convergence of dynamical systems in two timescales (Q2203452) (← links)
- Control strategy of speed servo systems based on deep reinforcement learning (Q2283851) (← links)
- TD-regularized actor-critic methods (Q2320580) (← links)
- Performance optimization for a class of generalized stochastic Petri nets (Q2348377) (← links)
- Reinforcement learning for a biped robot based on a CPG-actor-critic method (Q2383520) (← links)
- A tutorial on the cross-entropy method (Q2485925) (← links)
- Linear stochastic approximation driven by slowly varying Markov chains (Q2503529) (← links)
- An actor-critic algorithm for constrained Markov decision processes (Q2504518) (← links)
- Dynamic programming and suboptimal control: a survey from ADP to MPC (Q2511993) (← links)
- Reinforcement learning based algorithms for average cost Markov decision processes (Q2643632) (← links)
- Two-timescale stochastic gradient descent in continuous time with applications to joint online parameter estimation and optimal sensor placement (Q2692526) (← links)
- On Generalized Bellman Equations and Temporal-Difference Learning (Q3305109) (← links)
- A Spiking Neural Network Model of an Actor-Critic Learning Agent (Q3612121) (← links)