Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization

DOI10.1109/TIT.2010.2068870zbMath1366.62071arXiv0809.0853MaRDI QIDQ5281236

Michael I. Jordan, XuanLong Nguyen, Martin J. Wainwright

Publication date: 27 July 2017

Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)

Abstract: We develop and analyze $M$-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a non-asymptotic variational characterization of $f$-divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these estimators. Given conditions only on the ratios of densities, we show that our estimators can achieve optimal minimax rates for the likelihood ratio and the divergence functionals in certain regimes. We derive an efficient optimization algorithm for computing our estimates, and illustrate their convergence behavior and practical viability by simulations.

Full work available at URL: https://arxiv.org/abs/0809.0853

Mathematics Subject Classification ID

Nonparametric regression and quantile regression (62G08) Asymptotic properties of nonparametric inference (62G20) Nonparametric estimation (62G05) Convex programming (90C25)

Related Items (55)

GAT–GMM: Generative Adversarial Training for Gaussian Mixture Models ⋮ Non-parametric estimation of mutual information through the entropy of the linkage ⋮ A Monte Carlo approach to quantifying model error in Bayesian parameter estimation ⋮ Statistical analysis of distance estimators with density differences and density ratios ⋮ Model Uncertainty and Correctability for Directed Graphical Models ⋮ Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation ⋮ Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation ⋮ Smoothed noise contrastive mutual information neural estimation ⋮ Data-driven spatiotemporal modeling for structural dynamics on irregular domains by stochastic dependency neural estimation ⋮ A Deep Generative Approach to Conditional Sampling ⋮ On distributionally robust extreme value analysis ⋮ Stein variational gradient descent with learned direction ⋮ Geometry of EM and related iterative algorithms ⋮ Geometrical Insights for Implicit Generative Modeling ⋮ Statistical analysis of kernel-based least-squares density-ratio estimation ⋮ Level sets semimetrics for probability measures with applications in hypothesis testing ⋮ Aggregated tests based on supremal divergence estimators for non-regular statistical models ⋮ Computational complexity of kernel-based density-ratio estimation: a condition number analysis ⋮ On the empirical estimation of integral probability metrics ⋮ Convergence of latent mixing measures in finite and infinite mixture models ⋮ Least-squares two-sample test ⋮ Density-Difference Estimation ⋮ Conditional Density Estimation with Dimensionality Reduction via Squared-Loss Conditional Entropy Minimization ⋮ Online Direct Density-Ratio Estimation Applied to Inlier-Based Outlier Detection ⋮ Direct Density Derivative Estimation ⋮ Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation ⋮ Probabilistic model validation for uncertain nonlinear systems ⋮ Nonparametric Estimation of Küllback-Leibler Divergence ⋮ Semi-supervised learning of class balance under class-prior change by distribution matching ⋮ Minimum Divergence, Generalized Empirical Likelihoods, and Higher Order Expansions ⋮ Bias Reduction and Metric Learning for Nearest-Neighbor Estimation of Kullback-Leibler Divergence ⋮ Unnamed Item ⋮ Variational Representations and Neural Network Estimation of Rényi Divergences ⋮ Robust Actuarial Risk Analysis ⋮ Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search ⋮ Quantization and clustering with Bregman divergences ⋮ Change-point detection in time-series data by relative density-ratio estimation ⋮ Modern Bayesian experimental design ⋮ Calibrated adversarial algorithms for generative modelling ⋮ Reducing the statistical error of generative adversarial networks using space-filling sampling ⋮ Variational representations of annealing paths: Bregman information under monotonic embedding ⋮ Optimal experimental design: formulations and computations ⋮ Learning under nonstationarity: covariate shift and class-balance change ⋮ Non-parametric two-sample tests: recent developments and prospects ⋮ Robust Validation: Confident Predictions Even When Distributions Shift ⋮ Machine learning with squared-loss mutual information ⋮ Variational Bayesian optimal experimental design with normalizing flows ⋮ Adaptive joint distribution learning ⋮ Imitation Learning as f-Divergence Minimization ⋮ Constructive setting for problems of density ratio estimation ⋮ Relative Density-Ratio Estimation for Robust Distribution Comparison ⋮ Improving bridge estimators via $f$-GAN ⋮ Unnamed Item ⋮ Solving Inverse Stochastic Problems from Discrete Particle Observations Using the Fokker--Planck Equation and Physics-Informed Neural Networks ⋮ Formulation and properties of a divergence used to compare probability measures without absolute continuity

This page was built for publication: Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization