scientific article; zbMATH DE number 7306852
From MaRDI portal
Publication:5148924
Publication date: 5 February 2021
Full work available at URL: https://arxiv.org/abs/1412.1193
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
neural networksconvergence rateparameterization invariancenatural gradient methods2nd-order optimization
Related Items (30)
A fully stochastic second-order trust region method ⋮ QNG: A Quasi-Natural Gradient Method for Large-Scale Statistical Learning ⋮ Information geometry of physics-informed statistical manifolds and its use in data assimilation ⋮ Sketch-based empirical natural gradient methods for deep learning ⋮ Model-Centric Data Manifold: The Data Through the Eyes of the Model ⋮ A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training ⋮ Epistemic uncertainty quantification in deep learning classification by the delta method ⋮ Approximate Newton Policy Gradient Algorithms ⋮ Semi-implicit back propagation ⋮ Robust federated learning under statistical heterogeneity via hessian-weighted aggregation ⋮ An overview of stochastic quasi-Newton methods for large-scale machine learning ⋮ On the locality of the natural gradient for learning in deep Bayesian networks ⋮ Invariance properties of the natural gradient in overparametrised systems ⋮ Efficient Natural Gradient Descent Methods for Large-Scale PDE-Based Optimization Problems ⋮ Unnamed Item ⋮ Geometry and convergence of natural policy gradient methods ⋮ Deep learning and geometric deep learning: An introduction for mathematicians and physicists ⋮ Multi-agent natural actor-critic reinforcement learning algorithms ⋮ The limited-memory recursive variational Gaussian approximation (L-RVGA) ⋮ Riemannian Natural Gradient Methods ⋮ Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions ⋮ Optimization Methods for Large-Scale Machine Learning ⋮ Stochastic sub-sampled Newton method with variance reduction ⋮ Warped Riemannian Metrics for Location-Scale Models ⋮ Variational Bayes on manifolds ⋮ The recursive variational Gaussian approximation (R-VGA) ⋮ Laplace approximation and natural gradient for Gaussian process regression with heteroscedastic Student-\(t\) model ⋮ Structure-preserving deep learning ⋮ Understanding approximate Fisher information for fast convergence of natural gradient descent in wide neural networks* ⋮ Parametrisation independence of the natural gradient in overparametrised systems
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Introductory lectures on convex optimization. A basic course.
- The ubiquitous Kronecker product
- Online natural gradient as a Kalman filter
- Hessian Matrix vs. Gauss–Newton Hessian Matrix
- Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent
- Computing a Trust Region Step
- The Conjugate Gradient Method and Trust Regions in Large Scale Optimization
- Preconditioning of Truncated-Newton Methods
- An Algorithm for Least-Squares Estimation of Nonlinear Parameters
- Inexact Newton Methods
- Acceleration of Stochastic Approximation by Averaging
- Trust Region Methods
- Riemannian metrics for neural networks I: feedforward networks
- Trace bounds on the solution of the algebraic matrix Riccati and Lyapunov equation
- Iterative Solution of Nonlinear Equations in Several Variables
- Logarithmic Regret Algorithms for Online Convex Optimization
- Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
- 75.9 Euler’s Constant
- On‐line learning for very large data sets
- New Classes of Synchronous Codes
- A method for the solution of certain non-linear problems in least squares
This page was built for publication: