On inference validity of weighted U-statistics under data heterogeneity (Q1786572): Difference between revisions
From MaRDI portal
Latest revision as of 00:07, 15 September 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | On inference validity of weighted U-statistics under data heterogeneity |
scientific article |
Statements
On inference validity of weighted U-statistics under data heterogeneity (English)
0 references
24 September 2018
0 references
Consider independent (but not necessarily identically distributed) data \(X_1,\ldots,X_n\), and the corresponding U-statistic \[ U_n=\frac{(n-m)!}{n!}\sum_{\underset{1\leq i_1,\ldots,i_m\leq n}{i_j\not=i_k,\text{ }j\not=k}}a_n(i_1,\ldots,i_m)h_n(X_{i_1},\ldots,X_{i_m})\,, \] where no symmetry assumptions are made on the weight function \(a_n\) and the kernel function \(h_n\). The elimination of any assumptions of symmetry and of the IID nature of the underlying data are the key features of this work. The main results of the present paper are a central limit theorem (giving sufficient conditions for the convergence of \(U_n\) to normality as \(n\rightarrow\infty\)), and sufficient conditions for consistent bootstrap variance estimation (including a bounded second moment condition, and control on the heterogeneity of the distributions of the \(X_i\)). These results are applied to the cases of Kendall's tau and the average-precision correlation, defined by \[ \tau_K=\frac{2}{n(n-1)}\sum_{i\not=j}[1(X_i>X_j)1(i < j)+1(X_j>X_i)1(j < i)]-1\,, \] and \[ \tau_{AP}=\frac{2}{n-1}\sum_{i=2}^n\frac{\sum_{j=1}^{i-1}1(X_j>X_i)}{i-1}-1\,, \] respectively, which share the same kernel function, \(1(y>x)\). The work is motivated by the analysis of \(\tau_{AP}\), which appears in an information retrieval setting. Here, the \(X_i\) correspond to the scores given by rankings of a certain webpage, ordered by corresponding human rankings. \(\tau_{AP}\) is a rank correlation measure, designed to evaluate the quality of a given ranking algorithm, where more weight is given to errors at high rankings. Numerical experiments illustrate the main results, and show, for example, that consistency of the bootstrap variance estimation is more sensitive to data heterogeneity than the central limit theorem, and that finite sample bootstrap performance for \(\tau_{AP}\) seems to be generally better than that for \(\tau_K\). The proofs of the main results are combinatorial in flavour.
0 references
weighted U-statistics
0 references
bootstrap
0 references
rank correlation
0 references
average-precision correlation
0 references
central limit theorem
0 references
consistency
0 references
0 references
0 references