Estimating the Number of Clusters in a Data Set Via the Gap Statistic

From MaRDI portal
Publication:65481

DOI10.1111/1467-9868.00293zbMath0979.62046OpenAlexW2071949631MaRDI QIDQ65481

Robert Tibshirani, Guenther Walther, Trevor Hastie, Robert Tibshirani, Guenther Walther, Trevor Hastie

Publication date: 1 July 2001

Published in: Journal of the Royal Statistical Society Series B: Statistical Methodology (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1111/1467-9868.00293



Related Items

DESPOTA: dendrogram slicing through a pemutation test approach, Problems in gene clustering based on gene expression data, On a resampling approach for tests on the number of clusters with mixture model-based clustering of tissue samples, Finding predictive gene groups from microarray data, Temporal gap statistic: a new internal index to validate time series clustering, A non-parametric method to estimate the number of clusters, Partition of interval-valued observations using regression, Linearized alternating direction method of multipliers for sparse group and fused Lasso models, The cluster graphical Lasso for improved estimation of Gaussian graphical models, Estimating robot strengths with application to selection of alliance members in FIRST robotics competitions, Clustering with the average silhouette width, Generalized \(k\)-means in GLMs with applications to the outbreak of COVID-19 in the United States, A review on spectral clustering and stochastic block models, Sparse optimal discriminant clustering, A divisive clustering method for functional data with special consideration of outliers, Clusterability assessment for Gaussian mixture models, KM-MIC: an improved maximum information coefficient based on K-medoids clustering, Some clustering-based exact distribution-free \(k\)-sample tests applicable to high dimension, low sample size data, Clustering Chlorophyll-a satellite data using quantiles, Spatial associations in global household bicycle ownership, New bounding and decomposition approaches for MILP investment problems: multi-area transmission and generation planning under policy constraints, PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection, Variable selection for high-dimensional genomic data with censored outcomes using group Lasso prior, Identification of relevant subtypes via preweighted sparse clustering, A simple approach to sparse clustering, Use of symmetry and stability for data clustering, Strong consistency of \(k\)-parameters clustering, A statistical view of clustering performance through the theory of \(U\)-processes, Stability-based validation of bicluster solutions, Sensor fusion for SLAM based on information theory, Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data, Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection, Data filtering for cluster analysis by \(\ell _0\)-norm regularization, Hypothesis setting and order statistic for robust genomic meta-analysis, Resampling approach for cluster model selection, Variance-based cluster selection criteria in a \(K\)-means framework for one-mode dissimilarity data, Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis, Identifying cluster number for subspace projected functional data clustering, Cluster analysis of longitudinal profiles with subgroups, Temporal clustering of time series via threshold autoregressive models: application to commodity prices, Regularized \(k\)-means clustering of high-dimensional data and its asymptotic consistency, Clustering nonlinear, nonstationary time series using BSLEX, An application of the minimal spanning tree approach to the cluster stability problem, On the number of groups in clustering, MCS: A method for finding the number of clusters, Solution path clustering with adaptive concave penalty, Statistical challenges in functional genomics. (With comments and a rejoinder)., Comparison of three hypothesis testing approaches for the selection of the appropriate number of clusters of variables, Methods for merging Gaussian mixture components, Practical shape analysis and segmentation methods for point cloud models, On nonparametric feature filters in electromagnetic imaging, Mean field analysis of a spatial stochastic model of a gene regulatory network, Selection of variables in cluster analysis: An empirical comparison of eight procedures, Visual stability analysis for model selection in graded possibilistic clustering, Multiscale blind source separation, Clustering confidence sets, Linear grouping using orthogonal regression, Optimized profitability of LFP and NMC Li-ion batteries in residential PV applications, Sparse clustering of functional data, Using combinatorial optimization in model-based trimmed clustering with cardinality constraints, Bayesian Non-Parametric Factor Analysis for Longitudinal Spatial Surfaces, Joint adaptive mean-variance regularization and variance stabilization of high dimensional data, Vector quantization of amino acids: Analysis of the HIV V3 loop region, The Kullback information criterion for mixture regression models, Bayes factors in the presence of population stratification, Likelihood ratio test for partial sphericity in high and ultra-high dimensions, Cross-entropy clustering, Temporally consistent tone mapping of images and video using optimal \(K\)-means clustering, On the use of quantile regression to deal with heterogeneity: the case of multi-block data, Subject-treatment interactions in crossover trials: performance evaluation of subgrouping methods, An integrative pathway-based clinical-genomic model for cancer survival prediction, Pattern layer reduction for a generalized regression neural network by using a self-organizing map, High-dimensional variable selection with the plaid mixture model for clustering, Fast wavelet-based stochastic simulation using training images, A regionalisation approach for rainfall based on extremal dependence, A quasi-Bayesian perspective to online clustering, Sparse \(\ell_ {1}\) regularisation of matrix valued models for acoustic source characterisation, The hierarchical spectral merger algorithm: a new time series clustering procedure, A scalable solution framework for stochastic transmission and generation planning problems, Network modeling in biology: statistical methods for gene and brain networks, Selecting among multi-mode partitioning models of different complexities: a comparison of four model selection criteria, A semiparametric method for clustering mixed data, Dealing with label switching in mixture models under genuine multimodality, Subspace clustering of high-dimensional data: a predictive approach, Online phenotype discovery based on minimum classification error model, A multivariate uniformity test for the case of unknown support, An advancement in clustering via nonparametric density estimation, Optimality of spectral clustering in the Gaussian mixture model, Estimating the number of clusters in a ranking data context, CLUES: a non-parametric clustering method based on local shrinking, clusterSim, Assessing agreement of clustering methods with gene expression microarray data, Innovation in the cluster validating techniques, On the behaviour of \(K\)-means clustering of a dissimilarity matrix by means of full multidimensional scaling, A graph clustering approach to localization for adaptive covariance tuning in data assimilation based on state-observation mapping, An empirical comparison between stochastic and deterministic centroid initialisation for K-means variations, K-bMOM: A robust Lloyd-type clustering algorithm based on bootstrap median-of-means, Suboptimal comparison of partitions, Local stationarity in small area estimation models, Intelligent choice of the number of clusters in \(K\)-means clustering: an experimental study with different cluster spreads, A novel bagging approach for variable ranking and selection via a mixed importance measure, Clustering transformed compositional data usingK-means, with applications in gene expression and bicycle sharing system data, A comparative analysis of clustering algorithms to identify the homogeneous rainfall gauge stations of Bangladesh, Overlapping radial basis function interpolants for spectrally accurate approximation of functions of eigenvalues with application to buckling of composite plates, Selection of the number of clusters in functional data analysis, Análisis de los conglomerados de precipitación y sus cambios estacionales sobre América Central para el período 1976-2015, Clustering Microarray Data: Theoretical and Practical Issues, Cluster-based reduced-order modelling of a mixing layer, Model-based hierarchical clustering with Bregman divergences and Fishers mixture model: application to depth image analysis, Bayesian nonparametric clustering and association studies for candidate SNP observations, Clustering Dynamics on Graphs: From Spectral Clustering to Mean Shift Through Fokker–Planck Interpolation, Cluster Data Streams with Noisy Variables, Poisson Kernel-Based Clustering on the Sphere: Convergence Properties, Identifiability, and a Method of Sampling, Self-learning \(K\)-means clustering: a global optimization approach, Determine the number of clusters by data augmentation, Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation, Unnamed Item, A new internal index based on density core for clustering validation, Stability-Based Validation of Clustering Solutions, Model selection strategies for determining the optimal number of overlapping clusters in additive overlapping partitional clustering, Dimension-reduced clustering of functional data via subspace separation, A modified self-updating clustering algorithm for application to dengue gene expression data, Clustering of gamma-ray bursts through kernel principal component analysis, Clustering of seasonal events: A simulation study using circular methods, Finding groups in structural equation modeling through the partial least squares algorithm, Pseudo-quantile functional data clustering, Piecewise-Global Nonlinear Model Order Reduction for PDE-Constrained Optimization in High-Dimensional Parameter Spaces, Agglomerative and divisive hierarchical Bayesian clustering, Clustering of time series using quantile autocovariances, Degrees of freedom and model selection for \(k\)-means clustering, The cluster correlation-network support vector machine for high-dimensional binary classification, A change-point detection and clustering method in the recurrent-event context, Unnamed Item, On the nonparametric maximum likelihood estimator for Gaussian location mixture densities with application to Gaussian denoising, A statistical model of cluster stability, Determining the Number of Clusters Using Multivariate Ranks, Trimming algorithms for clustering contaminated grouped data and their robustness, A similarity measure for second order properties of non-stationary functional time series with applications to clustering and testing, Hierarchical clustering of continuous variables based on the empirical copula process and permutation linkages, Estimating and clustering curves in the presence of heteroscedastic errors, Effects of resampling in determining the number of clusters in a data set, \(\gamma\)-SUP: a clustering algorithm for cryo-electron microscopy images of asymmetric particles, Simultaneous Supervised and Unsupervised Classification Modeling for Assessing Cluster Analysis and Improving Results Interpretability, Estimating the number of clusters via a corrected clustering instability, A robust linear grouping algorithm, Dependence structure of market states, Model-based feature selection and clustering of RNA-seq data for unsupervised subtype discovery, Cross validation in LASSO and its acceleration, A Clustering Method for Categorical Ordinal Data, Clustering Nonstationary Circadian Rhythms using Locally Stationary Wavelet Representations, Unsupervised modelling of a transitional boundary layer, Multilevel Functional Clustering Analysis, Markov-switching state space models for uncovering musical interpretation, Multimodal Language Acquisition Based on Motor Learning and Interaction, Bootstrapping for Significance of Compact Clusters in Multidimensional Datasets, CLUSTERING FUNCTIONAL DATA USING WAVELETS, Discriminative variable selection for clustering with the sparse Fisher-EM algorithm, How Many Clusters? An Information-Theoretic Perspective, Yield and price forecasting for stochastic crop decision planning, Clustering Parkinson’s and Age-Related Voice Impairment Signal Features for Unsupervised Learning, \(K\)-means cloning: adaptive spherical \(K\)-means clustering, A sequential clustering algorithm with applications to gene expression data, Identifying Functional Connectivity in Large-Scale Neural Ensemble Recordings: A Multiscale Data Mining Approach, Deformation analysis in tunnels through curve clustering, Dynamic Tensor Clustering, Validating clusters with the lower bound for sum-of-squares error, Identifying ecosystem patterns from time series of anchovy (Engraulis ringens) and sardine (Sardinops sagax) landings in northern Chile, Visual Similarity Perception of Directed Acyclic Graphs: A Study on Influencing Factors and Similarity Judgment Strategies, Characterizing the Relationship Between HIV-1 Genotype and Phenotype: Prediction-Based Classification, Capturing the Forest but Missing the Trees: Microstates Inadequate for Characterizing Shorter-Scale EEG Dynamics, A fast multiobjective fuzzy clustering with multimeasures combination, Exploring Validity Indices for Clustering Textual Data, On the limits of clustering in high dimensions via cost functions, The next‐generation K‐means algorithm, A confusion index for measuring separation and clustering, Robust and sparse \(k\)-means clustering for high-dimensional data, Convex clustering for binary data, Model-based linear clustering, Multiscale clustering for functional data, Constrained clustering of irregularly sampled spatial data, Clustering in the Presence of Scatter, Evaluating reliability of tree-patterns in extreme-K categorical samples problems, A new nonparametric interpoint distance-based measure for assessment of clustering, Finding the Number of Normal Groups in Model-Based Clustering via Constrained Likelihoods, Clustering Categorical Data via Ensembling Dissimilarity Matrices, Improving Spectral Clustering Using the Asymptotic Value of the Normalized Cut, A General Hybrid Clustering Technique, Estimating the Number of Clusters Using Cross-Validation, On Application of a ProbabilisticK-Nearest Neighbors Model for Cluster Validation Problem, Coherence-based time series clustering for statistical inference and visualization of brain connectivity, Optimal string clustering based on a Laplace-like mixture and EM algorithm on a set of strings, Shrinkage Clustering: A Fast and Size-Constrained Algorithm for Biomedical Applications, Finding the Event Structure of Neuronal Spike Trains, Unnamed Item, Clustering time series by linear dependency, IntraClusTSP -- an incremental intra-cluster refinement heuristic algorithm for symmetric travelling salesman problem, Combining Genotype Groups and Recursive Partitioning: An Application to Human Immunodeficiency Virus Type 1 Genetics Data, Tests for statistical significance of a treatment effect in the presence of hidden sub-populations, Joint Entropy Maximization in Kernel-Based Topographic Maps, Tensor factorisation for narrowband single channel source decomposition, Statistical Learning of Nonlinear Stochastic Differential Equations from Nonstationary Time Series using Variational Clustering, Clustering Effects in Unreplicated Factorial Experiments, On the strengths of the self-updating process clustering algorithm, Individualized Multidirectional Variable Selection, Hierarchical clustering and matrix completion for the reconstruction of world input-output tables, A Unified Framework for Change Point Detection in High-Dimensional Linear Models, Time series analysis and prediction of nonlinear systems with ensemble learning framework applied to deep learning neural networks, Semiparametric partial common principal component analysis for covariance matrices, Simultaneous estimation of cluster number and feature sparsity in high‐dimensional cluster analysis, Distance Metrics and Clustering Methods for Mixed‐type Data, Ensemble clustering for step data via binning, Heuristics for a cash-collection routing problem with a cluster-first route-second approach, Functional distributional clustering using spatio-temporal data, Applications of monitoring and tracing the evolution of clustering solutions in dynamic datasets, Alpha geodesic distances for clustering of shapes, Comparison of statistical, machine learning, and mathematical modelling methods to investigate the effect of ageing on dog’s cardiovascular system, A testing approach to clustering scalar time series, Zero-inflated time series clustering via ensemble thick-pen transform, Incomplete clustering analysis via multiple imputation, Semidefinite programming based community detection for node-attributed networks and multiplex networks, Testing for Unobserved Heterogeneity via k-means Clustering, Homogeneity and Sparsity Analysis for High-Dimensional Panel Data Models, Clustering dimensionless learning for multiple-physical-regime systems, On efficient model selection for sparse hard and fuzzy center-based clustering algorithms, Nonparametric cluster significance testing with reference to a unimodal null distribution, A Doubly Enhanced EM Algorithm for Model-Based Tensor Clustering, Spontaneous Clustering via Minimum Gamma-Divergence, A Nonparametric Clustering Algorithm with a Quantile-Based Likelihood Estimator, Determining the Number of Clusters Using the Weighted Gap Statistic, Segmentation uncertainty in multiple change-point models, Reducing data dimension for cluster detection, Bootstrap method to evaluate tightness of clusters with application to the Korean standard occlusion study, Exhaustivek-nearest-neighbour subspace clustering, Cluster-based feedback control of turbulent post-stall separated flows