Pessimistic value iteration for multi-task data sharing in offline reinforcement learning

From MaRDI portal
Publication:6152665