Entropy-SGD: biasing gradient descent into wide valleys
From MaRDI portal
Publication:5854121
DOI10.1088/1742-5468/ab39d9zbMath1459.65091arXiv1611.01838OpenAlexW2552194003MaRDI QIDQ5854121
Levent Sagun, Carlo Baldassi, Anna Choromanska, Yann LeCun, Riccardo Zecchina, Pratik Chaudhari, Jennifer T. Chayes, Christian Borgs, Stefano Soatto
Publication date: 16 March 2021
Published in: Journal of Statistical Mechanics: Theory and Experiment (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1611.01838
Numerical mathematical programming methods (65K05) Nonconvex programming, global optimization (90C26)
Related Items
Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry*, Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration, Entropy-SGD, Archetypal landscapes for deep neural networks, The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima, Wasserstein-Based Projections with Applications to Inverse Problems, Black holes and the loss landscape in machine learning, Consistent Sparse Deep Learning: Theory and Computation, Barcodes as summary of loss function topology, Run-and-inspect method for nonconvex optimization and global optimality bounds for R-local minimizers, Diametrical risk minimization: theory and computations, Singular perturbations in stochastic optimal control with unbounded data, Markov chain stochastic DCA and applications in deep learning with PDEs regularization, Pessimistic value iteration for multi-task data sharing in offline reinforcement learning, Geometric deep learning: a temperature based analysis of graph neural networks, Dimension-free log-Sobolev inequalities for mixture distributions, Forward stability of ResNet and its variants, Ensemble Kalman inversion: a derivative-free technique for machine learning tasks, Chaos and complexity from quantum neural network. A study with diffusion metric in machine learning, Bias of homotopic gradient descent for the hinge loss, On Bayesian posterior mean estimators in imaging sciences and Hamilton-Jacobi partial differential equations, Deep relaxation: partial differential equations for optimizing deep neural networks, Interpretable machine learning: fundamental principles and 10 grand challenges, A spin glass model for the loss surfaces of generative adversarial networks, Global Minima of Overparameterized Neural Networks, Unnamed Item, Building a telescope to look into high-dimensional image spaces, Entropic gradient descent algorithms and wide flat minima*
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Mutual information, metric entropy and cumulative relative entropy risk
- Langevin diffusions and Metropolis-Hastings algorithms
- Replica symmetry breaking condition exposed by random matrix calculation of landscape complexity
- Flat Minima
- Local entropy as a measure for sampling solutions in constraint satisfaction problems
- Acceleration of Stochastic Approximation by Averaging
- 10.1162/153244302760200704