Activation function design for deep networks: linearity and effective initialisation
From MaRDI portal
Recommendations
- On the effect of the activation function on the distribution of hidden nodes in a deep network
- Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean
- Exact solutions of a deep linear network
- A weight initialization based on the linear product structure for neural networks
- A new initialization method based on normed statistical spaces in deep networks
Cites work
- scientific article; zbMATH DE number 1324223 (Why is no real title available?)
- scientific article; zbMATH DE number 194922 (Why is no real title available?)
- Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising
- Bayesian learning for neural networks
- Distributions
- High dimensional robust M-estimation: asymptotic variance via approximate message passing
- Learning representations by back-propagating errors
- On the Lambert \(w\) function
- One-sided inference about functionals of a density
- Robust Estimation of a Location Parameter
- Robust Statistics
- Stable architectures for deep neural networks
- Statistical decision theory and Bayesian analysis. 2nd ed
- The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions
- Variation Diminishing Transformations: A Direct Approach to Total Positivity and its Statistical Applications
Cited in
(4)- Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean
- A survey on modern trainable activation functions
- A decoupled physics-informed neural network for recovering a space-dependent force function in the wave equation from integral overdetermination data
- Principles for initialization and architecture selection in graph neural networks with ReLU activations
This page was built for publication: Activation function design for deep networks: linearity and effective initialisation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2134109)