Infinite-dimensional gradient-based descent for alpha-divergence minimisation
From MaRDI portal
Publication:2054493
Abstract: This paper introduces the -descent, an iterative algorithm which operates on measures and performs -divergence minimisation in a Bayesian framework. This gradient-based procedure extends the commonly-used variational approximation by adding a prior on the variational parameters in the form of a measure. We prove that for a rich family of functions , this algorithm leads at each step to a systematic decrease in the -divergence and derive convergence results. Our framework recovers the Entropic Mirror Descent algorithm and provides an alternative algorithm that we call the Power Descent. Moreover, in its stochastic formulation, the -descent allows to optimise the mixture weights of any given mixture model without any information on the underlying distribution of the variational parameters. This renders our method compatible with many choices of parameters updates and applicable to a wide range of Machine Learning tasks. We demonstrate empirically on both toy and real-world examples the benefit of using the Power descent and going beyond the Entropic Mirror Descent framework, which fails as the dimension grows.
Recommendations
Cites work
- scientific article; zbMATH DE number 6377992 (Why is no real title available?)
- scientific article; zbMATH DE number 3173999 (Why is no real title available?)
- scientific article; zbMATH DE number 1222284 (Why is no real title available?)
- scientific article; zbMATH DE number 3200971 (Why is no real title available?)
- 10.1162/jmlr.2003.3.4-5.993
- A Stochastic Approximation Method
- Adaptive importance sampling in monte carlo integration
- An introduction to variational methods for graphical models
- Bayesian Estimates of Equation System Parameters: An Application of Integration by Monte Carlo
- Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference
- Convergence of adaptive mixtures of importance sampling schemes
- Convex optimization: algorithms and complexity
- Efficiency versus robustness: The case for minimum Hellinger distance and related methods
- Expectation Propagation in the Large Data Limit
- Families of alpha-, beta- and gamma-divergences: flexible and robust measures of similarities
- Large-scale machine learning with stochastic gradient descent
- Markov Processes and the H-Theorem
- Mirror descent and nonlinear projected subgradient methods for convex optimization.
- On Information and Sufficiency
- Optimal global rates of convergence for nonparametric regression
- Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems
- Robust Stochastic Approximation Approach to Stochastic Programming
- Rényi Divergence and Kullback-Leibler Divergence
- Safe adaptive importance sampling: a mixture approach
Cited in
(3)
This page was built for publication: Infinite-dimensional gradient-based descent for alpha-divergence minimisation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2054493)