Expected similarity estimation for large-scale batch and streaming anomaly detection

DOI10.1007/S10994-016-5567-7MaRDI QIDQ1689600zbMATH OpenOpenAlexFDO

Authors Markus Schneider, Wolfgang Ertel, Fabio Ramos

Publication date 12 January 2018

Published in Machine Learning (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1601.06602

kernel methods anomaly detection large-scale data Hilbert space embedding mean map

Nonparametric estimation (62G05) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Statistical aspects of big data and data science (62R07) Pattern recognition, speech recognition (68T10) Informational aspects of data analysis and big data (94A16)

Abstract: We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.

Recommendations

Cites work

Cited in

(6)

Describes a project that uses

Uses Software

This page was built for publication: Expected similarity estimation for large-scale batch and streaming anomaly detection

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1689600)