Provably efficient offline reinforcement learning with trajectory-wise reward

From MaRDI portal
Publication:6670141