Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution
From MaRDI portal
Publication:2189396
Abstract: Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control --- measured entrywise and by the spectral norm --- which might be of independent interest.
Recommendations
- Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval
- Gradient descent with non-convex constraints: local concavity determines convergence
- Nonconvex Low-Rank Tensor Completion from Noisy Data
- The nonsmooth landscape of phase retrieval
- Fast global convergence of gradient methods for high-dimensional statistical recovery
Cites work
- scientific article; zbMATH DE number 204193 (Why is no real title available?)
- scientific article; zbMATH DE number 6026126 (Why is no real title available?)
- 10.1162/153244302760200704
- A Probabilistic and RIPless Theory of Compressed Sensing
- A flexible convex relaxation for phase retrieval
- A nonconvex approach for phase retrieval: reshaped Wirtinger flow and incremental algorithms
- A note on \(\sin\Theta\) theorems for singular subspace variations
- A simpler approach to matrix completion
- A tail inequality for quadratic forms of subgaussian random vectors
- A useful variant of the Davis-Kahan theorem for statisticians
- A well-tempered landscape for non-convex robust subspace recovery
- ADMiRA: Atomic Decomposition for Minimum Rank Approximation
- An elementary proof of convex phase retrieval in the natural parameter space via the linear program PhaseMax
- An introduction to matrix concentration inequalities
- Blind Deconvolution Using Convex Programming
- Blind Recovery of Sparse Signals From Subsampled Convolution
- Blind deconvolution by a steepest descent algorithm on a quotient manifold
- Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture
- Compressive Phase Retrieval via Generalized Approximate Message Passing
- Concentration and moment inequalities for polynomials of independent random variables
- Convex Recovery of a Structured Signal from Independent Random Linear Measurements
- Convex optimization: algorithms and complexity
- Debiasing the Lasso: optimal sample size for Gaussian designs
- Estimating the matrix \(p\)-norm
- Exact and Stable Covariance Estimation From Quadratic Sampling via Convex Programming
- Exact matrix completion via convex optimization
- Fast and Guaranteed Blind Multichannel Deconvolution Under a Bilinear System Model
- Fast rank-one alternating minimization algorithm for phase retrieval
- Fast, robust and non-convex subspace recovery
- Fundamental limits of weak recovery with applications to phase retrieval
- GESPAR: Efficient Phase Retrieval of Sparse Signals
- Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval
- Guaranteed Matrix Completion via Non-Convex Factorization
- Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization
- Guarantees of Riemannian optimization for low rank matrix recovery
- Hanson-Wright inequality and sub-Gaussian concentration
- Incoherence-Optimal Matrix Completion
- Inference and uncertainty quantification for noisy matrix completion
- Low rank matrix completion by alternating steepest descent methods
- Low-rank matrix completion using alternating minimization
- Matrix Completion From a Few Entries
- Matrix completion and low-rank SVD via fast alternating least squares
- Matrix completion from noisy entries
- Near-optimal bounds for phase synchronization
- Non-Convex Phase Retrieval From STFT Measurements
- Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
- Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion
- On robust regression with high-dimensional predictors
- Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow
- Oracle inequalities in empirical risk minimization and sparse recovery problems. École d'Été de Probabilités de Saint-Flour XXXVIII-2008.
- Orthogonal Procrustes rotation for two or more matrices
- Perturbation Bounds for the Polar Decomposition
- Perturbation bounds for matrix square roots and Pythagorean sums
- Perturbation bounds in connection with singular value decomposition
- Phase Retrieval Using Alternating Minimization
- Phase retrieval via Wirtinger flow: theory and algorithms
- Phase retrieval via matrix completion
- Phase retrieval via randomized Kaczmarz: theoretical guarantees
- Phase retrieval with one or two diffraction patterns by alternating projections with the null initialization
- PhaseMax: Convex Phase Retrieval via Basis Pursuit
- ROP: matrix recovery via rank-one projections
- Rank-Sparsity Incoherence for Matrix Decomposition
- Rapid, robust, and reliable blind deconvolution via nonconvex optimization
- Recovering Low-Rank Matrices From Few Coefficients in Any Basis
- Regularized gradient descent: a non-convex recipe for fast joint blind deconvolution and demixing
- Restricted strong convexity and weighted matrix completion: optimal bounds with noise
- Robust principal component analysis?
- Self-calibration and biconvex compressive sensing
- Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems
- Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow
- Solving quadratic equations via phaselift when there are about as many equations as unknowns
- Solving systems of phaseless equations via Kaczmarz methods: a proof of concept study
- Sparse Phase Retrieval via Truncated Amplitude Flow
- Spectral method and regularized MLE are both optimal for top-\(K\) ranking
- Structured Signal Recovery From Quadratic Measurements: Breaking Sample Complexity Barriers via Nonconvex Optimization
- Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization
- The Power of Convex Relaxation: Near-Optimal Matrix Completion
- The Rotation of Eigenvectors by a Perturbation. III
- The implicit bias of gradient descent on separable data
- The landscape of empirical risk for nonconvex losses
- The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square
- The local convexity of solving systems of quadratic equations
- The probabilistic method. With an appendix on the life and work of Paul Erdős.
- The projected power method: an efficient algorithm for joint alignment from pairwise differences
- The spectral norm of a nonnegative matrix
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
Cited in
(43)- An equivalence between critical points for rank constraints versus low-rank factorizations
- Byzantine-robust distributed sparse learning for \(M\)-estimation
- Robust recovery of low-rank matrices with non-orthogonal sparse decomposition from incomplete measurements
- Non-convex exact community recovery in stochastic block model
- First-Order Methods for Nonconvex Quadratic Minimization
- Bridging convex and nonconvex optimization in robust PCA: noise, outliers and missing data
- Subspace estimation from unbalanced and incomplete data matrices: \({\ell_{2,\infty}}\) statistical guarantees
- Lower bounds for non-convex stochastic optimization
- Estimation from nonlinear observations via convex programming with application to bilinear regression
- Median-truncated gradient descent: a robust and scalable nonconvex approach for signal estimation
- Gradient descent for deep matrix factorization: dynamics and implicit bias towards low rank
- Gradient descent with non-convex constraints: local concavity determines convergence
- Regularized gradient descent: a non-convex recipe for fast joint blind deconvolution and demixing
- High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity
- Inference for heteroskedastic PCA with missing data
- Proof methods for robust low-rank matrix recovery
- A diffusion process perspective on posterior contraction rates for parameters
- scientific article; zbMATH DE number 7561466 (Why is no real title available?)
- Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval
- Smoothed amplitude flow-based phase retrieval algorithm
- Near-optimal performance bounds for orthogonal and permutation group synchronization via spectral methods
- Misspecified nonconvex statistical optimization for sparse phase retrieval
- Spectral method and regularized MLE are both optimal for top-\(K\) ranking
- Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent
- The numerics of phase retrieval
- Approximate message passing with spectral initialization for generalized linear models*
- scientific article; zbMATH DE number 7307468 (Why is no real title available?)
- Noisy matrix completion: understanding statistical guarantees for convex relaxation via nonconvex optimization
- Matrix completion with nonconvex regularization: spectral operators and scalable algorithms
- Normal approximation and confidence region of singular subspaces
- Inference for low-rank completion without sample splitting with application to treatment effect estimation
- Algorithmic Regularization in Model-Free Overparametrized Asymmetric Matrix Factorization
- GNMR: a provable one-line algorithm for low rank matrix recovery
- scientific article; zbMATH DE number 7415093 (Why is no real title available?)
- Low-rank matrix recovery under heavy-tailed errors
- Improved performance guarantees for orthogonal group synchronization via generalized power method
- Near-optimal bounds for generalized orthogonal Procrustes problem via generalized power method
- Nonconvex Low-Rank Tensor Completion from Noisy Data
- Adaptive iterative hard thresholding for low-rank matrix recovery and rank-one measurements
- Truncated amplitude flow with coded diffraction patterns
- scientific article; zbMATH DE number 7370581 (Why is no real title available?)
- Asymmetry helps: eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices
- Entrywise eigenvector analysis of random matrices with low expected rank
Describes a project that uses
Uses Software
This page was built for publication: Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2189396)