The remarkable simplicity of very high dimensional data: application of model-based clustering
From MaRDI portal
Publication:263091
DOI10.1007/S00357-009-9037-9zbMATH Open1337.62136arXiv0805.2756OpenAlexW2157267184MaRDI QIDQ263091FDOQ263091
Authors: F. Murtagh
Publication date: 4 April 2016
Published in: Journal of Classification (Search for Journal in Brave)
Abstract: An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling.
Full work available at URL: https://arxiv.org/abs/0805.2756
Recommendations
- From Data to the Physics Using Ultrametrics: New Results in High Dimensional Data Analysis
- Hierarchical Clustering of Massive, High Dimensional Data Sets by Exploiting Ultrametric Embedding
- Ultrametric embedding: application to data fingerprinting and to fast data clustering
- Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets
- On the logistic behaviour of the topological ultrametricity of data
Cites Work
- Estimating the dimension of a model
- Geometric representation of association between categories
- Bayes Factors
- Title not available (Why is that?)
- The analytical solution of the additive constant problem
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Geometric Representation of High Dimension, Low Sample Size Data
- Neighborliness of randomly projected simplices in high dimensions
- Dissimilarity and distance coefficients in automation-supported thesauri
- On a general transformation making a dissimilarity matrix Euclidean
- Hierarchical Clustering of Massive, High Dimensional Data Sets by Exploiting Ultrametric Embedding
- The high-dimension, low-sample-size geometric representation holds under mild conditions
- From Data to the Physics Using Ultrametrics: New Results in High Dimensional Data Analysis
- Hierarchical trees can be perfectly scaled in one dimension
- On ultrametricity, data coding, and computation
Cited In (18)
- Model-based clustering of high-dimensional data: a review
- Fast, linear time hierarchical clustering using the Baire metric
- Ultrametric embedding: application to data fingerprinting and to fast data clustering
- Model based clustering of high-dimensional binary data
- An algorithm for deciding the number of clusters and validation using simulated data with application to exploring crop population structure
- From Data to the Physics Using Ultrametrics: New Results in High Dimensional Data Analysis
- On the limits of clustering in high dimensions via cost functions
- Symmetry in data mining and analysis: a unifying view based on hierarchy
- A survey on unsupervised outlier detection in high‐dimensional numerical data
- Thinking ultrametrically, thinking \(p\)-adically
- Reduced-Rank Modeling for High-Dimensional Model-Based Clustering
- Discussion of: Treelets -- an adaptive multi-scale basis for sparse unordered data
- Direct reading algorithm for hierarchical clustering
- From data to the \(p\)-adic or ultrametric model
- On the logistic behaviour of the topological ultrametricity of data
- Finding ultrametricity in data using topology
- Ultrametricity indices for the Euclidean and Boolean hypercubes
- Ultrametricity of dissimilarity spaces and its significance for data mining
Uses Software
This page was built for publication: The remarkable simplicity of very high dimensional data: application of model-based clustering
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q263091)