Anomaly Detection in High Dimensional Data (Q74767)

From MaRDI portal





scientific article from arXiv
Language Label Description Also known as
default for all languages
No label defined
    English
    Anomaly Detection in High Dimensional Data
    scientific article from arXiv

      Statements

      12 August 2019
      0 references
      stat.ML
      0 references
      cs.LG
      0 references
      stat.AP
      0 references
      0 references
      0 references
      This article introduces a novel algorithm for detecting anomalies in high-dimensional data, known as the stray algorithm. Developed to overcome limitations in the performance of existing algorithms like HDoutliers, this method identifies anomalies based on extreme value theory by calculating thresholds for large distance gaps between observations. Extensive testing with both synthetic and real datasets has demonstrated that the stray algorithm not only outperforms its predecessor but also excels in terms of accuracy and computational efficiency. The stray algorithm is available as an open-source R package, further highlighting its versatility and potential impact on anomaly detection methods. (English)
      0 references
      This paper introduces a novel algorithm for spotting unusual data points in big piles of information—the stray algorithm. It was created because another tool (HDoutliers) has some flaws that make it less useful under specific conditions. The new algorithm looks at how much something stands out from the rest, using fancy math called extreme value theory to find special cases. Tests on pretend and real data showed that the stray does a better job of finding weird stuff than its predecessor, being more accurate and quicker too. You can get it for free as an R package that anyone can use! (English)
      0 references

      Identifiers

      0 references