Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)

From MaRDI portal
Publication:6342704