Outlier detection under interval uncertainty: algorithmic solvability and computational complexity (Q2484030)

From MaRDI portal





scientific article; zbMATH DE number 2190574
Language Label Description Also known as
default for all languages
No label defined
    English
    Outlier detection under interval uncertainty: algorithmic solvability and computational complexity
    scientific article; zbMATH DE number 2190574

      Statements

      Outlier detection under interval uncertainty: algorithmic solvability and computational complexity (English)
      0 references
      0 references
      0 references
      0 references
      0 references
      0 references
      2 August 2005
      0 references
      The purpose of the present paper is to investigate outlier detection problems under the interval uncertainty approach, to provide an analysis of their computational complexity, to obtain new efficient algorithms that solve some of these problems (that are tractable under reasonable conditions), and to take into consideration related open problems. The traditional engineering approach to outlier detection takes the following steps: (a) The measurement, ``normal'' values \(x_1, x_2, \dots, x_n\) are collected. (b) For these normal values the sample average \(E = (x_1 + \cdots+ x_n) / n\) and the standard deviation \(\sigma = \sqrt V\) are computed, where \(V = [(x_1 - E)^2 +\cdots + (x_n - E)^2] / n.\) (c) A new measurement result \(x\) is classified as an outlier if \(x\) is outside the \(k_0\sigma\) interval \([L, U]\), where \(L = E - k_0\sigma, U = E + k_0\sigma\), and \(k_0 > 1\) is some preselected parameter (most frequently, \(k_0 = 2, 3\), or 6). In real life, the normal values \(x_1,\dots, x_n\) are situated within certain interval ranges, and for different values of \(x_i\) \((i = 1,\dots,n)\) within its interval, one get different bounds \(L\) and \(U\) of the \(k_0\sigma\) interval. Detecting now the outliers requires to obtain: (1) the possible outliers, defined as located outside of (at least) one of the possible \(k_0\sigma\) intervals \([L, U]\), and (2) the guaranteed outliers, defined as being located outside of all possible \(k_0\sigma\) intervals \([L, U]\). It is thus essential to compute the exact ranges of the interval bounds \(L\) and \(U\). The main results may now be summarized as follows: (i) Computing the exact ranges of the outlier interval bounds \(L\) and \(U\) is proved to be, in the general case, an NP-hard problem. (ii) The authors propose efficient (quadratic-time) algorithms that compute the ranges of \(L\) and \(U\) under reasonable conditions. (iii) Once a value is identified as an outlier for a fixed parameter value \(k_0\), the paper shows how to find out to what degree this value is an outlier, i.e. what is the largest value \(k_0\) for which this value is an outlier for sure.
      0 references
      outlier detection problem
      0 references
      interval uncertainty approach
      0 references
      computational complexity
      0 references
      possible outliers
      0 references
      guaranteed outliers
      0 references
      interval ranges
      0 references
      exact ranges of interval bounds
      0 references
      efficient quadratic-time algorithms
      0 references
      nonlinear optimization
      0 references

      Identifiers

      0 references
      0 references
      0 references
      0 references
      0 references
      0 references