Randomized incomplete \(U\)-statistics in high dimensions (Q2284368)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Randomized incomplete \(U\)-statistics in high dimensions
scientific article

    Statements

    Randomized incomplete \(U\)-statistics in high dimensions (English)
    0 references
    0 references
    0 references
    0 references
    15 January 2020
    0 references
    The authors consider the problem of statistical inference for the mean vector \(\mathbb{E}h(X_1,\ldots,X_r)\), based on independent and identically distributed data \(X_1,\ldots,X_n\) taking values in a measurable space \((S,\mathcal{S})\), and where \(h:S^r\mapsto\mathbb{R}^d\) is a fixed, symmetric function. Their aim is to develop tools for the setting where \(d\) is possibly much larger than \(n\), but where \(n\) is also large. In this setting, the commonly used \(U\)-statistic \[ \frac{1}{|I_{n,r}|}\sum_{(i_1,\ldots,i_r)\in I_{n,r}}h(X_{i_1},\ldots,X_{i_r})\,, \] where \(I_{n,r}\) is the set of all \(r\)-tuples in \(\{1,\ldots,n\}\), suffers from problems of computational scalability. As a solution to this, the authors propose two randomized incomplete \(U\)-statistics, where the average is taken over only a randomly chosen subset of \(I_{n,r}\), rather than over all elements of this set. The first of these uses Bernoulli sampling (or, equivalently, sampling without replacement), and the second uses sampling with replacement. Under some assumptions (such as some boundedness assumptions), Gaussian approximation results are established for these randomized incomplete \(U\)-statistics, with an explicit rate of convergence, in both the nondegenerate and degenerate cases. Since the limiting Gaussian distribution here has a covariance matrix depending on the unknown underlying distribution, fully data-dependent bootstrap techniques are developed which make these results applicable. The paper concludes with a simulation study investigating this framework in the setting of testing for pairwise independence of elements of a high-dimensional random vector using several well-known statistics from the literature.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    incomplete \(U\)-statistics
    0 references
    randomized inference
    0 references
    Gaussian approximation
    0 references
    bootstrap
    0 references
    divide and conquer
    0 references
    Bernoulli sampling
    0 references
    sampling with replacement
    0 references
    0 references