On inference validity of weighted U-statistics under data heterogeneity (Q1786572)

From MaRDI portal
scientific article
Language Label Description Also known as
English
On inference validity of weighted U-statistics under data heterogeneity
scientific article

    Statements

    On inference validity of weighted U-statistics under data heterogeneity (English)
    0 references
    0 references
    0 references
    0 references
    24 September 2018
    0 references
    Consider independent (but not necessarily identically distributed) data \(X_1,\ldots,X_n\), and the corresponding U-statistic \[ U_n=\frac{(n-m)!}{n!}\sum_{\underset{1\leq i_1,\ldots,i_m\leq n}{i_j\not=i_k,\text{ }j\not=k}}a_n(i_1,\ldots,i_m)h_n(X_{i_1},\ldots,X_{i_m})\,, \] where no symmetry assumptions are made on the weight function \(a_n\) and the kernel function \(h_n\). The elimination of any assumptions of symmetry and of the IID nature of the underlying data are the key features of this work. The main results of the present paper are a central limit theorem (giving sufficient conditions for the convergence of \(U_n\) to normality as \(n\rightarrow\infty\)), and sufficient conditions for consistent bootstrap variance estimation (including a bounded second moment condition, and control on the heterogeneity of the distributions of the \(X_i\)). These results are applied to the cases of Kendall's tau and the average-precision correlation, defined by \[ \tau_K=\frac{2}{n(n-1)}\sum_{i\not=j}[1(X_i>X_j)1(i < j)+1(X_j>X_i)1(j < i)]-1\,, \] and \[ \tau_{AP}=\frac{2}{n-1}\sum_{i=2}^n\frac{\sum_{j=1}^{i-1}1(X_j>X_i)}{i-1}-1\,, \] respectively, which share the same kernel function, \(1(y>x)\). The work is motivated by the analysis of \(\tau_{AP}\), which appears in an information retrieval setting. Here, the \(X_i\) correspond to the scores given by rankings of a certain webpage, ordered by corresponding human rankings. \(\tau_{AP}\) is a rank correlation measure, designed to evaluate the quality of a given ranking algorithm, where more weight is given to errors at high rankings. Numerical experiments illustrate the main results, and show, for example, that consistency of the bootstrap variance estimation is more sensitive to data heterogeneity than the central limit theorem, and that finite sample bootstrap performance for \(\tau_{AP}\) seems to be generally better than that for \(\tau_K\). The proofs of the main results are combinatorial in flavour.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    weighted U-statistics
    0 references
    bootstrap
    0 references
    rank correlation
    0 references
    average-precision correlation
    0 references
    central limit theorem
    0 references
    consistency
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references