Cited in
(18)- Binary quantized network training with sharpness-aware minimization
- A statistician teaches deep learning
- Megatron-LM
- GhostNet
- Europarl
- GShard
- M2M-100
- Mesh TensorFlow
- EGC: entropy-based gradient compression for distributed deep learning
- scientific article; zbMATH DE number 7370624 (Why is no real title available?)
- BiT
- REALM
- On the convergence analysis of asynchronous SGD for solving consistent linear systems
- The stochastic delta rule: faster and more accurate deep learning through adaptive weight noise
- Associated learning: decomposing end-to-end backpropagation based on autoencoders and target propagation
- Deep double descent: where bigger models and more data hurt*
- mT5
- DeepSpeed
This page was built for software: GPipe