Randomized incomplete \(U\)-statistics in high dimensions (Q2284368)

From MaRDI portal





scientific article
Language Label Description Also known as
default for all languages
No label defined
    English
    Randomized incomplete \(U\)-statistics in high dimensions
    scientific article

      Statements

      Randomized incomplete \(U\)-statistics in high dimensions (English)
      0 references
      0 references
      0 references
      15 January 2020
      0 references
      The authors consider the problem of statistical inference for the mean vector \(\mathbb{E}h(X_1,\ldots,X_r)\), based on independent and identically distributed data \(X_1,\ldots,X_n\) taking values in a measurable space \((S,\mathcal{S})\), and where \(h:S^r\mapsto\mathbb{R}^d\) is a fixed, symmetric function. Their aim is to develop tools for the setting where \(d\) is possibly much larger than \(n\), but where \(n\) is also large. In this setting, the commonly used \(U\)-statistic \[ \frac{1}{|I_{n,r}|}\sum_{(i_1,\ldots,i_r)\in I_{n,r}}h(X_{i_1},\ldots,X_{i_r})\,, \] where \(I_{n,r}\) is the set of all \(r\)-tuples in \(\{1,\ldots,n\}\), suffers from problems of computational scalability. As a solution to this, the authors propose two randomized incomplete \(U\)-statistics, where the average is taken over only a randomly chosen subset of \(I_{n,r}\), rather than over all elements of this set. The first of these uses Bernoulli sampling (or, equivalently, sampling without replacement), and the second uses sampling with replacement. Under some assumptions (such as some boundedness assumptions), Gaussian approximation results are established for these randomized incomplete \(U\)-statistics, with an explicit rate of convergence, in both the nondegenerate and degenerate cases. Since the limiting Gaussian distribution here has a covariance matrix depending on the unknown underlying distribution, fully data-dependent bootstrap techniques are developed which make these results applicable. The paper concludes with a simulation study investigating this framework in the setting of testing for pairwise independence of elements of a high-dimensional random vector using several well-known statistics from the literature.
      0 references
      0 references
      incomplete \(U\)-statistics
      0 references
      randomized inference
      0 references
      Gaussian approximation
      0 references
      bootstrap
      0 references
      divide and conquer
      0 references
      Bernoulli sampling
      0 references
      sampling with replacement
      0 references
      0 references
      0 references

      Identifiers

      0 references
      0 references
      0 references
      0 references
      0 references
      0 references
      0 references