Expected similarity estimation for large-scale batch and streaming anomaly detection
From MaRDI portal
Publication:1689600
Abstract: We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.
Recommendations
- Constant time EXPected similarity estimation for large-scale anomaly detection
- DAMP: accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams
- A classification framework for anomaly detection
- Loda: lightweight on-line detector of anomalies
- Extreme value theory for anomaly detection -- the GPD classifier
Cites work
- scientific article; zbMATH DE number 1717334 (Why is no real title available?)
- scientific article; zbMATH DE number 1934540 (Why is no real title available?)
- scientific article; zbMATH DE number 2107521 (Why is no real title available?)
- 10.1162/1532443041827925
- A Hilbert Space Embedding for Distributions
- A kernel two-sample test
- A survey of outlier detection methodologies
- A survey on concept drift adaptation
- Approximations of the critical region of the fbietkan statistic
- Constant time EXPected similarity estimation for large-scale anomaly detection
- Equivalence of distance-based and RKHS-based statistics in hypothesis testing
- Estimating the support of a high-dimensional distribution
- Foundations of Modern Probability
- Infinite dimensional analysis. A hitchhiker's guide.
- Knowledge discovery from data streams.
- On the Convergence of Pattern Search Algorithms
- Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach
- Statistical comparisons of classifiers over multiple data sets
- Support Vector Machines
- Support vector data description
Cited in
(6)- An ordinal anomaly probability algorithm for anomaly detection problems in massive data sets
- Loda: lightweight on-line detector of anomalies
- Unsupervised streaming anomaly detection for instrumented infrastructure
- A survey of outlier detection in high dimensional data streams
- DAMP: accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams
- Constant time EXPected similarity estimation for large-scale anomaly detection
This page was built for publication: Expected similarity estimation for large-scale batch and streaming anomaly detection
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1689600)