Entropy-SGD: biasing gradient descent into wide valleys

From MaRDI portal
Publication:5854121

DOI10.1088/1742-5468/ab39d9zbMath1459.65091arXiv1611.01838OpenAlexW2552194003MaRDI QIDQ5854121

Levent Sagun, Carlo Baldassi, Anna Choromanska, Yann LeCun, Riccardo Zecchina, Pratik Chaudhari, Jennifer T. Chayes, Christian Borgs, Stefano Soatto

Publication date: 16 March 2021

Published in: Journal of Statistical Mechanics: Theory and Experiment (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1611.01838



Related Items

Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry*, Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration, Entropy-SGD, Archetypal landscapes for deep neural networks, The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima, Wasserstein-Based Projections with Applications to Inverse Problems, Black holes and the loss landscape in machine learning, Consistent Sparse Deep Learning: Theory and Computation, Barcodes as summary of loss function topology, Run-and-inspect method for nonconvex optimization and global optimality bounds for R-local minimizers, Diametrical risk minimization: theory and computations, Singular perturbations in stochastic optimal control with unbounded data, Markov chain stochastic DCA and applications in deep learning with PDEs regularization, Pessimistic value iteration for multi-task data sharing in offline reinforcement learning, Geometric deep learning: a temperature based analysis of graph neural networks, Dimension-free log-Sobolev inequalities for mixture distributions, Forward stability of ResNet and its variants, Ensemble Kalman inversion: a derivative-free technique for machine learning tasks, Chaos and complexity from quantum neural network. A study with diffusion metric in machine learning, Bias of homotopic gradient descent for the hinge loss, On Bayesian posterior mean estimators in imaging sciences and Hamilton-Jacobi partial differential equations, Deep relaxation: partial differential equations for optimizing deep neural networks, Interpretable machine learning: fundamental principles and 10 grand challenges, A spin glass model for the loss surfaces of generative adversarial networks, Global Minima of Overparameterized Neural Networks, Unnamed Item, Building a telescope to look into high-dimensional image spaces, Entropic gradient descent algorithms and wide flat minima*


Uses Software


Cites Work