Bayesian distillation of deep learning models (Q2069701)

From MaRDI portal





scientific article; zbMATH DE number 7461105
Language Label Description Also known as
default for all languages
No label defined
    English
    Bayesian distillation of deep learning models
    scientific article; zbMATH DE number 7461105

      Statements

      Bayesian distillation of deep learning models (English)
      0 references
      0 references
      0 references
      21 January 2022
      0 references
      The authors present a Bayesian approach to teacher-student networks' knowledge distillation. Knowledge distillation was first proposed by \textit{G. Hinton} et al. in their paper [``Distilling the knowledge in a neural network'', Preprint, \url{arXiv:1503.02531}]. They proposed to train a large network with ground truth labels as the teacher network, then train a smaller model on the outputs of the teacher network as ``soft targets''. This work extends the prior framework of teacher-student networks. The authors argue that the parameters of the student network can be initialized from the teacher network. The teacher network is usually larger than the student network. To meaningfully initialize the student network, the authors propose to prune the teacher network so that it has the same architecture as the student network. With the assumption that the posterior of the teacher network follows a Gaussian distribution, the authors prove that the pruned teacher network also follows a Gaussian distribution.
      0 references
      0 references
      deep learning
      0 references
      Bayesian methods
      0 references
      knowledge distillation
      0 references
      model selection
      0 references
      Bayesian inference
      0 references
      0 references
      0 references
      0 references
      0 references
      0 references
      0 references
      0 references
      0 references

      Identifiers