Using Subset Log-Likelihoods to Trim Outliers in Gaussian Mixture Models

DOI10.48550/ARXIV.1907.01136MaRDI QIDQ84825arXivFDO

Authors Katharine M. Clark, Paul D. McNicholas

Publication date 2 July 2019

Abstract: Unsupervised classification, or clustering, is a problem often plagued by outliers, yet there is a paucity of work on handling outliers in unsupervised classification. Outlier algorithms tend to fall into two broad categories: outlier inclusion methods and trimming methods, which often require pre-specification of the number of points to remove. The fact that sample Mahalanobis distance is beta-distributed is used to derive an approximate distribution for the log-likelihoods of subset finite Gaussian mixture models. An algorithm is proposed that removes the least likely points, which are deemed outliers, until the log-likelihoods adhere to the reference distribution. This results in a trimming method which inherently estimates the number of outliers present.

Cited in

(1)

oclust

This page was built for publication: Using Subset Log-Likelihoods to Trim Outliers in Gaussian Mixture Models

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q84825)