Cross-study replicability in cluster analysis
From MaRDI portal
Publication:6166879
DOI10.1214/22-STS871arXiv2202.01910MaRDI QIDQ6166879FDOQ6166879
Authors: Lorenzo Masoero, Emma G. Thomas, Giovanni Parmigiani, Svitlana Tyekucheva, Lorenzo Trippa
Publication date: 7 July 2023
Published in: Statistical Science (Search for Journal in Brave)
Abstract: In cancer research, clustering techniques are widely used for exploratory analyses and dimensionality reduction, playing a critical role in the identification of novel cancer subtypes, often with direct implications for patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicability of clustering procedures, that is, their ability to consistently recover biologically meaningful clusters across several datasets. In this paper, we review existing methods to assess replicability of clustering analyses, and discuss a framework for evaluating cross-study clustering replicability, useful when two or more studies are available. These approaches can be applied to any clustering algorithm and can employ different measures of similarity between partitions to quantify replicability, globally (i.e. for the whole sample) as well as locally (i.e. for individual clusters). Using experiments on synthetic and real gene expression data, we illustrate the utility of replicability metrics to evaluate if the same clusters are identified consistently across a collection of datasets.
Full work available at URL: https://arxiv.org/abs/2202.01910
Cites Work
- Are clusters found in one dataset present in another dataset?
- Clustering by passing messages between data points
- Visualizing data using t-SNE
- Title not available (Why is that?)
- Least squares quantization in PCM
- Random partition models with regression on covariates
- Bayesian cluster analysis: point estimation and credible balls (with discussion)
- Stability
- Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance
- On similarity indices and correction for chance agreement
- Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data
- Cluster-wise assessment of cluster stability
- Selection of the number of clusters via the bootstrap method
- Stability-Based Validation of Clustering Solutions
- Resampling method for unsupervised estimation of cluster validity
- Stability of k-Means Clustering
- Problems in gene clustering based on gene expression data
- Definitions, methods, and applications in interpretable machine learning
- Bayesian nonparametric cross-study validation of prediction methods
- Constructing a high-dimensional \(k\)NN-graph using a Z-order curve
- Cross-study replicability in cluster analysis
Cited In (1)
This page was built for publication: Cross-study replicability in cluster analysis
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6166879)