Computing population variance and entropy under interval uncertainty: Linear-time algorithms (Q2481140)

Assume the measurements \(x_1,\dots,x_n\). Unlike as in classical statistics, where the values \(x_i\) are known, the authors assume that they only know the intervals \(\big[\underline x_i, \overline x_i\big]\) of possible values of~\(x_i\). Usually the intervals are of the form \(\big[\widetilde x_i-\Delta_i, \widetilde x_i+\Delta_i\big]\), where \(\widetilde x_i\) is the measurement result and \(\Delta_i\) is the known upper bound on the absolute value \(| \widetilde x_i| \) of the (unknown) measurement error \(\Delta_i=\widetilde x_i-x_i\); \(| \Delta x_i| \leq\Delta_i\). The authors are interested in computing the range \([\underline V,\overline V]\) of the population variance \(V={1\over n} \sum_{i=1}^{n}(x_i-E)^2\), where \(E={1\over n}\sum_{i=1}^n x_i\). While~\(\underline V\) can be computed efficiently, computing~\(\overline V\) is, in general, an NP hard problem. The main result is an algorithm enabling to compute~\(\underline V\) in linear time \(O(n)\) and~\(\overline V\) in \(O(n\log n)\) time in the general case. Moreover, one special situation is discussed when~\(\overline V\) can be also computed in linear time \(O(n)\), too. More precisely, this situation covers the case when no interval is a proper subinterval of another one. Finally, a linear time algorithm is also described for computing the range of the entropy \(S=-\sum_{i=1}^{n} p_i\log\,p_i\).

0 references

zbMATH Keywords

sample variance

0 references

entropy

0 references

interval data

0 references

reviewed by

Jaromír Antoch

0 references

MaRDI profile type