k-variance: a clustered notion of variance

Classification and discrimination; cluster analysis (statistical aspects) (62H30) Optimal transportation (49Q22) Order statistics; empirical distribution functions (62G30)

Abstract: We introduce

k

-variance, a generalization of variance built on the machinery of random bipartite matchings.

K

-variance measures the expected cost of matching two sets of

k

samples from a distribution to each other, capturing local rather than global information about a measure as

k

increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining

k

-variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of

m a t h b b R^{n}

. We conclude with experiments and open problems motivated by this new way to summarize distributional shape.

Recommendations

Cites work

Cited in

(1)

Variance and clustering

This page was built for publication: \(k\)-variance: a clustered notion of variance

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5089735)