Global and local two-sample tests via regression
From MaRDI portal
Publication:2283577
random forestskernel regressionpermutation testintrinsic dimensionnearest neighbor regressiongalaxy morphology
Nonparametric hypothesis testing (62G10) Asymptotic properties of nonparametric inference (62G20) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Linear regression; mixed models (62J05) Hypothesis testing in multivariate analysis (62H15) Image analysis in multivariate analysis (62H35) Applications of statistics to physics (62P35) Galactic and stellar structure (85A15)
Abstract: Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature, there have been recent methodological developments such as classification accuracy tests. The goal of this work is to present a regression approach to comparing multivariate distributions of complex data. Depending on the chosen regression model, our framework can efficiently handle different types of variables and various structures in the data, with competitive power under many practical scenarios. Whereas previous work has been largely limited to global tests which conceal much of the local information, our approach naturally leads to a local two-sample testing framework in which we identify local differences between multivariate distributions with statistical confidence. We demonstrate the efficacy of our approach both theoretically and empirically, under some well-known parametric and nonparametric regression methods. Our proposed methods are applied to simulated data as well as a challenging astronomy data set to assess their practical usefulness.
Recommendations
Cites work
- scientific article; zbMATH DE number 1964693 (Why is no real title available?)
- scientific article; zbMATH DE number 893887 (Why is no real title available?)
- A consistent test of functional form via nonparametric estimation techniques
- A distribution-free theory of nonparametric regression
- A goodness-of-fit test for logistic regression models based on case-control data
- A kernel two-sample test
- A power comparison between nonparametric regression tests.
- A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices
- A sharper Bonferroni procedure for multiple tests of significance
- All of Nonparametric Statistics
- An empirical comparison of ensemble methods based on classification trees
- An estimate of the remainder in a combinatorial central limit theorem
- An updated review of goodness-of-fit tests for regression models
- Analysis of a random forests model
- Classification accuracy as a proxy for two-sample testing
- Comparing distributions
- Comparing nonparametric versus parametric regression fits
- Comparing two samples by penalized logistic regression
- Diffusion maps
- Dimension reduction and variable selection in case control studies via regularized likelihood optimization
- Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps
- High-dimensional generalized linear models and the lasso
- High-order accurate methods for retrospective sampling problems
- Information-theoretic determination of minimax rates of convergence
- Least-squares two-sample test
- Lectures on the nearest neighbor method
- Local significant differences from nonparametric two-sample tests
- Logistic disease incidence models and case-control studies
- Maximum likelihood for generalized case-control studies
- Minimax Testing of Nonparametric Hypotheses on a Distribution Density in the $L_p$ Metrics
- Non-asymptotic minimax rates of testing in signal detection
- Nonparametric smoothing and lack-of-fit tests
- On a new multivariate two-sample test.
- On high dimensional two-sample tests based on nearest neighbors
- On robust estimation in logistic case-control studies
- On the use of random forest for two-sample testing
- Permutation tests for studying classifier performance
- Random forests
- Rates of convergence for the \(k\)-nearest neighbor estimators with smoother regression functions
- Separate sample logistic discrimination
- Statistics for high-dimensional data. Methods, theory and applications.
- Test of homogeneity in semiparametric two-sample density ratio models
- Testing Statistical Hypotheses
- Testing a linear regression model against nonparametric alternatives
- Testing the hypothesis of a general linear model using nonparametric regression estimation
- The classification permutation test: a flexible approach to testing for covariate imbalance in observational studies
- Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates
Cited in
(12)- Globaltest confidence regions and their application to ridge regression
- Testing under local misspecification and artificial regressions
- Detecting distributional differences in labeled sequence data with application to tropical cyclone satellite imagery
- Exhaustive Goodness of Fit Via Smoothed Inference and Graphics
- On the use of random forest for two-sample testing
- Minimax optimality of permutation tests
- Probabilistic multi-resolution scanning for two-sample differences
- Local significant differences from nonparametric two-sample tests
- Model-independent detection of new physics signals using interpretable semisupervised classifier tests
- A new set of tools for goodness-of-fit validation
- Asymptotic Distribution-Free Independence Test for High-Dimension Data
- ODC and ROC curves, comparison curves and stochastic dominance
This page was built for publication: Global and local two-sample tests via regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2283577)