Sparse optimization on measures with over-parameterized gradient descent
From MaRDI portal
Publication:2149558
Abstract: Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical problem arising, e.g., in sparse spikes deconvolution or two-layer neural networks training. We show that this problem can be solved by discretizing the measure and running non-convex gradient descent on the positions and weights of the particles. For measures on a -dimensional manifold and under some non-degeneracy assumptions, this leads to a global optimization algorithm with a complexity scaling as in the desired accuracy , instead of for convex methods. The key theoretical tools are a local convergence analysis in Wasserstein space and an analysis of a perturbed mirror descent in the space of measures. Our bounds involve quantities that are exponential in which is unavoidable under our assumptions.
Recommendations
- Global convergence analysis of sparse regular nonconvex optimization problems
- Analysis of a two-layer neural network via displacement convexity
- Gradient descent with non-convex constraints: local concavity determines convergence
- Linear convergence of accelerated conditional gradient algorithms in spaces of measures
- On the minimization of a Tikhonov functional with a non-convex sparsity constraint
Cites work
- scientific article; zbMATH DE number 3680123 (Why is no real title available?)
- scientific article; zbMATH DE number 3790208 (Why is no real title available?)
- scientific article; zbMATH DE number 51537 (Why is no real title available?)
- scientific article; zbMATH DE number 1061412 (Why is no real title available?)
- scientific article; zbMATH DE number 1972910 (Why is no real title available?)
- scientific article; zbMATH DE number 2107836 (Why is no real title available?)
- scientific article; zbMATH DE number 5223994 (Why is no real title available?)
- scientific article; zbMATH DE number 3313108 (Why is no real title available?)
- A JKO splitting scheme for Kantorovich-Fisher-Rao gradient flows
- A course in metric geometry
- A family of functional inequalities: Łojasiewicz inequalities and displacement convex functions
- A mean field view of the landscape of two-layer neural networks
- A new optimal transport distance on the space of finite Radon measures
- Accelerated information gradient flow
- An interpolating distance between optimal transport and Fisher-Rao metrics
- An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Breaking the curse of dimensionality with convex neural networks
- Compressed Sensing Off the Grid
- Deep learning
- Exact Solutions to Super Resolution on Semi-Algebraic Domains in Higher Dimensions
- Exact reconstruction using Beurling minimal extrapolation
- Exact solutions of infinite dimensional total-variation regularized problems
- Exact support recovery for sparse spikes deconvolution
- Gradient flows in metric spaces and in the space of probability measures
- Inverse problems in spaces of measures
- Kurdyka–Łojasiewicz–Simon inequality for gradient flows in metric spaces
- Mean field analysis of neural networks: a law of large numbers
- Mirror descent and nonlinear projected subgradient methods for convex optimization.
- Natural gradient via optimal transport
- On representer theorems and convex regularization
- On the Rate of Convergence of Empirical Measures in ∞-transportation Distance
- On the linear convergence rates of exchange and continuous methods for total variation minimization
- Optimal entropy-transport problems and a new Hellinger-Kantorovich distance between positive measures
- Optimal transport for applied mathematicians. Calculus of variations, PDEs, and modeling
- Optimization with sparsity-inducing penalties
- Poincaré and logarithmic Sobolev inequalities by decomposition of the energy landscape
- Positive trigonometric polynomials and signal processing applications
- Probabilistic representation and uniqueness results for measure-valued solutions of transport equations
- Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance
- Sparse modeling for image and vision processing
- The alternating descent conditional gradient method for sparse inverse problems
- The basins of attraction of the global minimizers of the non-convex sparse spike estimation problem
- The sliding Frank-Wolfe algorithm and its application to super-resolution microscopy
- Towards a Mathematical Theory of Super‐resolution
- Unbalanced optimal transport: dynamic and Kantorovich formulations
Cited in
(12)- Global convergence analysis of sparse regular nonconvex optimization problems
- Estimation of off-the grid sparse spikes with over-parametrized projected gradient descent: theory and application
- Proximal methods for point source localisation
- A rigorous framework for the mean field limit of multilayer neural networks
- On the uniqueness of solutions for the basis pursuit in the continuum
- More is Less: Inducing Sparsity via Overparameterization
- Learning sparse features can lead to overfitting in neural networks
- Localization of point scatterers via sparse optimization on measures
- Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary
- Regularizing Orientation Estimation in Cryogenic Electron Microscopy Three-Dimensional Map Refinement through Measure-Based Lifting over Riemannian Manifolds
- Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation
- Convergence rates of gradient methods for convex optimization in the space of measures
This page was built for publication: Sparse optimization on measures with over-parameterized gradient descent
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2149558)