Outlier detection under interval uncertainty: algorithmic solvability and computational complexity (Q2484030)

scientific article; zbMATH DE number 2190574

Language	Label	Description	Also known as
default for all languages	No label defined
English	Outlier detection under interval uncertainty: algorithmic solvability and computational complexity	scientific article; zbMATH DE number 2190574

Statements

instance of

scholarly article

0 references

title

Outlier detection under interval uncertainty: algorithmic solvability and computational complexity (English)

0 references

0 references

0 references

0 references

0 references

0 references

0 references

2 August 2005

0 references

full work available at URL

https://digitalcommons.utep.edu/cgi/viewcontent.cgi?article=1283&context=cs_techrep

0 references

review text

The purpose of the present paper is to investigate outlier detection problems under the interval uncertainty approach, to provide an analysis of their computational complexity, to obtain new efficient algorithms that solve some of these problems (that are tractable under reasonable conditions), and to take into consideration related open problems. The traditional engineering approach to outlier detection takes the following steps: (a) The measurement, ``normal'' values \(x_1, x_2, \dots, x_n\) are collected. (b) For these normal values the sample average \(E = (x_1 + \cdots+ x_n) / n\) and the standard deviation \(\sigma = \sqrt V\) are computed, where \(V = [(x_1 - E)^2 +\cdots + (x_n - E)^2] / n.\) (c) A new measurement result \(x\) is classified as an outlier if \(x\) is outside the \(k_0\sigma\) interval \([L, U]\), where \(L = E - k_0\sigma, U = E + k_0\sigma\), and \(k_0 > 1\) is some preselected parameter (most frequently, \(k_0 = 2, 3\), or 6). In real life, the normal values \(x_1,\dots, x_n\) are situated within certain interval ranges, and for different values of \(x_i\) \((i = 1,\dots,n)\) within its interval, one get different bounds \(L\) and \(U\) of the \(k_0\sigma\) interval. Detecting now the outliers requires to obtain: (1) the possible outliers, defined as located outside of (at least) one of the possible \(k_0\sigma\) intervals \([L, U]\), and (2) the guaranteed outliers, defined as being located outside of all possible \(k_0\sigma\) intervals \([L, U]\). It is thus essential to compute the exact ranges of the interval bounds \(L\) and \(U\). The main results may now be summarized as follows: (i) Computing the exact ranges of the outlier interval bounds \(L\) and \(U\) is proved to be, in the general case, an NP-hard problem. (ii) The authors propose efficient (quadratic-time) algorithms that compute the ranges of \(L\) and \(U\) under reasonable conditions. (iii) Once a value is identified as an outlier for a fixed parameter value \(k_0\), the paper shows how to find out to what degree this value is an outlier, i.e. what is the largest value \(k_0\) for which this value is an outlier for sure.

0 references

reviewed by

Neculai Curteanu

0 references

zbMATH Keywords

outlier detection problem

0 references

interval uncertainty approach

0 references

computational complexity

0 references

possible outliers