Outlier detection under interval uncertainty: algorithmic solvability and computational complexity (Q2484030)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Outlier detection under interval uncertainty: algorithmic solvability and computational complexity |
scientific article |
Statements
Outlier detection under interval uncertainty: algorithmic solvability and computational complexity (English)
0 references
2 August 2005
0 references
The purpose of the present paper is to investigate outlier detection problems under the interval uncertainty approach, to provide an analysis of their computational complexity, to obtain new efficient algorithms that solve some of these problems (that are tractable under reasonable conditions), and to take into consideration related open problems. The traditional engineering approach to outlier detection takes the following steps: (a) The measurement, ``normal'' values \(x_1, x_2, \dots, x_n\) are collected. (b) For these normal values the sample average \(E = (x_1 + \cdots+ x_n) / n\) and the standard deviation \(\sigma = \sqrt V\) are computed, where \(V = [(x_1 - E)^2 +\cdots + (x_n - E)^2] / n.\) (c) A new measurement result \(x\) is classified as an outlier if \(x\) is outside the \(k_0\sigma\) interval \([L, U]\), where \(L = E - k_0\sigma, U = E + k_0\sigma\), and \(k_0 > 1\) is some preselected parameter (most frequently, \(k_0 = 2, 3\), or 6). In real life, the normal values \(x_1,\dots, x_n\) are situated within certain interval ranges, and for different values of \(x_i\) \((i = 1,\dots,n)\) within its interval, one get different bounds \(L\) and \(U\) of the \(k_0\sigma\) interval. Detecting now the outliers requires to obtain: (1) the possible outliers, defined as located outside of (at least) one of the possible \(k_0\sigma\) intervals \([L, U]\), and (2) the guaranteed outliers, defined as being located outside of all possible \(k_0\sigma\) intervals \([L, U]\). It is thus essential to compute the exact ranges of the interval bounds \(L\) and \(U\). The main results may now be summarized as follows: (i) Computing the exact ranges of the outlier interval bounds \(L\) and \(U\) is proved to be, in the general case, an NP-hard problem. (ii) The authors propose efficient (quadratic-time) algorithms that compute the ranges of \(L\) and \(U\) under reasonable conditions. (iii) Once a value is identified as an outlier for a fixed parameter value \(k_0\), the paper shows how to find out to what degree this value is an outlier, i.e. what is the largest value \(k_0\) for which this value is an outlier for sure.
0 references
outlier detection problem
0 references
interval uncertainty approach
0 references
computational complexity
0 references
possible outliers
0 references
guaranteed outliers
0 references
interval ranges
0 references
exact ranges of interval bounds
0 references
efficient quadratic-time algorithms
0 references
nonlinear optimization
0 references
0 references