Using Subset Log-Likelihoods to Trim Outliers in Gaussian Mixture Models

From MaRDI portal




Abstract: Unsupervised classification, or clustering, is a problem often plagued by outliers, yet there is a paucity of work on handling outliers in unsupervised classification. Outlier algorithms tend to fall into two broad categories: outlier inclusion methods and trimming methods, which often require pre-specification of the number of points to remove. The fact that sample Mahalanobis distance is beta-distributed is used to derive an approximate distribution for the log-likelihoods of subset finite Gaussian mixture models. An algorithm is proposed that removes the least likely points, which are deemed outliers, until the log-likelihoods adhere to the reference distribution. This results in a trimming method which inherently estimates the number of outliers present.





Cited in
(1)






This page was built for publication: Using Subset Log-Likelihoods to Trim Outliers in Gaussian Mixture Models

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q84825)