Flat Minima

From MaRDI portal
Revision as of 21:51, 3 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:3123284

DOI10.1162/NECO.1997.9.1.1zbMath0872.68150OpenAlexW2912811302WikidataQ34422981 ScholiaQ34422981MaRDI QIDQ3123284

Sepp Hochreiter, Jürgen Schmidhuber

Publication date: 6 March 1997

Published in: Neural Computation (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1162/neco.1997.9.1.1




Related Items (28)

Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry*Global optimization issues in deep network regression: an overview‘Place-cell’ emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight spaceOn Different Facets of Regularization TheoryMachine learning the kinematics of spherical particles in fluid flowsArchetypal landscapes for deep neural networksThe inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minimaLipschitzness is all you need to tame off-policy generative adversarial imitation learningUnnamed ItemGeometric characterization of the Eyring-Kramers formulaLotka-Volterra model with mutations and generative adversarial networksDiametrical risk minimization: theory and computationsOptimization for deep learning: an overviewUnnamed ItemUnnamed ItemInterpretable machine learning: fundamental principles and 10 grand challengesA spin glass model for the loss surfaces of generative adversarial networksNoise-induced degeneration in online learningMinimum description length revisitedUnnamed ItemEntropy-SGD: biasing gradient descent into wide valleysUniversal statistics of Fisher information in deep neural networks: mean field approach*Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixturesStructure-preserving deep learningHausdorff dimension, heavy tails, and generalization in neural networks*Entropic gradient descent algorithms and wide flat minima*Adaptive regularization parameter selection method for enhancing generalization capability of neural networksPrediction errors for penalized regressions based on generalized approximate message passing




Cites Work




This page was built for publication: Flat Minima