Flat Minima
From MaRDI portal
Publication:3123284
DOI10.1162/NECO.1997.9.1.1zbMath0872.68150OpenAlexW2912811302WikidataQ34422981 ScholiaQ34422981MaRDI QIDQ3123284
Sepp Hochreiter, Jürgen Schmidhuber
Publication date: 6 March 1997
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1162/neco.1997.9.1.1
Related Items (28)
Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry* ⋮ Global optimization issues in deep network regression: an overview ⋮ ‘Place-cell’ emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space ⋮ On Different Facets of Regularization Theory ⋮ Machine learning the kinematics of spherical particles in fluid flows ⋮ Archetypal landscapes for deep neural networks ⋮ The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima ⋮ Lipschitzness is all you need to tame off-policy generative adversarial imitation learning ⋮ Unnamed Item ⋮ Geometric characterization of the Eyring-Kramers formula ⋮ Lotka-Volterra model with mutations and generative adversarial networks ⋮ Diametrical risk minimization: theory and computations ⋮ Optimization for deep learning: an overview ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Interpretable machine learning: fundamental principles and 10 grand challenges ⋮ A spin glass model for the loss surfaces of generative adversarial networks ⋮ Noise-induced degeneration in online learning ⋮ Minimum description length revisited ⋮ Unnamed Item ⋮ Entropy-SGD: biasing gradient descent into wide valleys ⋮ Universal statistics of Fisher information in deep neural networks: mean field approach* ⋮ Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures ⋮ Structure-preserving deep learning ⋮ Hausdorff dimension, heavy tails, and generalization in neural networks* ⋮ Entropic gradient descent algorithms and wide flat minima* ⋮ Adaptive regularization parameter selection method for enhancing generalization capability of neural networks ⋮ Prediction errors for penalized regressions based on generalized approximate message passing
Cites Work
- A Mathematical Theory of Communication
- Modeling by shortest data description
- Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation
- Statistical predictor identification
- Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter
- An Information Measure for Classification
This page was built for publication: Flat Minima