Consistent Sparse Deep Learning: Theory and Computation
From MaRDI portal
Abstract: Deep learning has been the engine powering many successes of data science. However, the deep neural network (DNN), as the basic model of deep learning, is often excessively over-parameterized, causing many difficulties in training, prediction and interpretation. We propose a frequentist-like method for learning sparse DNNs and justify its consistency under the Bayesian framework: the proposed method could learn a sparse DNN with at most connections and nice theoretical guarantees such as posterior consistency, variable selection consistency and asymptotically optimal generalization bounds. In particular, we establish posterior consistency for the sparse DNN with a mixture Gaussian prior, show that the structure of the sparse DNN can be consistently determined using a Laplace approximation-based marginal posterior inclusion probability approach, and use Bayesian evidence to elicit sparse DNNs learned by an optimization method such as stochastic gradient descent in multiple runs with different initializations. The proposed method is computationally more efficient than standard Bayesian methods for large-scale sparse DNNs. The numerical results indicate that the proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection, both advancing interpretable machine learning.
Recommendations
- Learning sparse deep neural networks with a spike-and-slab prior
- On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
- Deep Learning as Sparsity-Enforcing Algorithms
- Sparse Bayesian deep learning for dynamic system identification
- Sparse deep neural networks using \(L_{1,\infty}\)-weight normalization
Cites work
- scientific article; zbMATH DE number 6378127 (Why is no real title available?)
- scientific article; zbMATH DE number 4215168 (Why is no real title available?)
- scientific article; zbMATH DE number 1034042 (Why is no real title available?)
- An introduction to variational methods for graphical models
- Bayesian Subset Modeling for High-Dimensional Generalized Linear Models
- Bayesian estimation of sparse signals with a continuous spike-and-slab prior
- Bayesian neural networks for selection of drug sensitive genes
- Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities
- Convergence rates of posterior distributions.
- Deep double descent: where bigger models and more data hurt*
- Entropy-SGD: biasing gradient descent into wide valleys
- Error bounds for approximations with deep ReLU networks
- Evidence Evaluation for Bayesian Neural Networks Using Contour Monte Carlo
- Model selection in Bayesian neural networks via horseshoe priors
- Nearly optimal Bayesian shrinkage for high-dimensional regression
- Nonparametric regression using deep neural networks with ReLU activation function
- On deep learning as a remedy for the curse of dimensionality in nonparametric regression
- Optimal approximation of piecewise smooth functions using deep ReLU neural networks
- Optimal approximation with sparsely connected deep neural networks
- Some PAC-Bayesian theorems
- Sparse graphical models for exploring gene expression data
- Spike and slab variable selection: frequentist and Bayesian strategies
- Transformed \(\ell_1\) regularization for learning sparse deep neural networks
- Weak Convergence Rates of Population Versus Single-Chain Stochastic Approximation MCMC Algorithms
Cited in
(24)- Posterior concentrations of fully-connected Bayesian neural networks with general priors on the weights
- Sparse deep neural networks using \(L_{1,\infty}\)-weight normalization
- Dynamic sparse method for deep learning execution
- A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks
- Transformed \(\ell_1\) regularization for learning sparse deep neural networks
- Sparse kernel deep stacking networks
- Extended fiducial inference for individual treatment effects via deep neural networks
- Bayesian autoencoders for data-driven discovery of coordinates, governing equations and fundamental constants
- A new paradigm for generative adversarial networks based on randomized decision rules
- Learning with Structured Sparsity
- Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding
- Word-Level Maximum Mean Discrepancy Regularization for Word Embedding
- On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
- Nonconvex Sparse Regularization for Deep Neural Networks and Its Optimality
- Sparse Bayesian deep learning for dynamic system identification
- SSN: learning sparse switchable normalization via SparsestMax
- Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds
- scientific article; zbMATH DE number 7626756 (Why is no real title available?)
- Bayesian scalar-on-image regression with a spatially varying single-layer neural network prior
- Deep Learning as Sparsity-Enforcing Algorithms
- Inference, learning and attention mechanisms that exploit and preserve sparsity in CNNs
- Extended fiducial inference: toward an automated process of statistical inference
- Learning sparse deep neural networks with a spike-and-slab prior
- Deep learning: a Bayesian perspective
This page was built for publication: Consistent Sparse Deep Learning: Theory and Computation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6110715)