Outlier detection under interval uncertainty: algorithmic solvability and computational complexity (Q2484030)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Outlier detection under interval uncertainty: algorithmic solvability and computational complexity
scientific article

    Statements

    Outlier detection under interval uncertainty: algorithmic solvability and computational complexity (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    2 August 2005
    0 references
    The purpose of the present paper is to investigate outlier detection problems under the interval uncertainty approach, to provide an analysis of their computational complexity, to obtain new efficient algorithms that solve some of these problems (that are tractable under reasonable conditions), and to take into consideration related open problems. The traditional engineering approach to outlier detection takes the following steps: (a) The measurement, ``normal'' values \(x_1, x_2, \dots, x_n\) are collected. (b) For these normal values the sample average \(E = (x_1 + \cdots+ x_n) / n\) and the standard deviation \(\sigma = \sqrt V\) are computed, where \(V = [(x_1 - E)^2 +\cdots + (x_n - E)^2] / n.\) (c) A new measurement result \(x\) is classified as an outlier if \(x\) is outside the \(k_0\sigma\) interval \([L, U]\), where \(L = E - k_0\sigma, U = E + k_0\sigma\), and \(k_0 > 1\) is some preselected parameter (most frequently, \(k_0 = 2, 3\), or 6). In real life, the normal values \(x_1,\dots, x_n\) are situated within certain interval ranges, and for different values of \(x_i\) \((i = 1,\dots,n)\) within its interval, one get different bounds \(L\) and \(U\) of the \(k_0\sigma\) interval. Detecting now the outliers requires to obtain: (1) the possible outliers, defined as located outside of (at least) one of the possible \(k_0\sigma\) intervals \([L, U]\), and (2) the guaranteed outliers, defined as being located outside of all possible \(k_0\sigma\) intervals \([L, U]\). It is thus essential to compute the exact ranges of the interval bounds \(L\) and \(U\). The main results may now be summarized as follows: (i) Computing the exact ranges of the outlier interval bounds \(L\) and \(U\) is proved to be, in the general case, an NP-hard problem. (ii) The authors propose efficient (quadratic-time) algorithms that compute the ranges of \(L\) and \(U\) under reasonable conditions. (iii) Once a value is identified as an outlier for a fixed parameter value \(k_0\), the paper shows how to find out to what degree this value is an outlier, i.e. what is the largest value \(k_0\) for which this value is an outlier for sure.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    outlier detection problem
    0 references
    interval uncertainty approach
    0 references
    computational complexity
    0 references
    possible outliers
    0 references
    guaranteed outliers
    0 references
    interval ranges
    0 references
    exact ranges of interval bounds
    0 references
    efficient quadratic-time algorithms
    0 references
    nonlinear optimization
    0 references
    0 references