Significance analysis of high-dimensional, low-sample size partially labeled data
From MaRDI portal
Abstract: Classification and clustering are both important topics in statistical learning. A natural question herein is whether predefined classes are really different from one another, or whether clusters are really there. Specifically, we may be interested in knowing whether the two classes defined by some class labels (when they are provided), or the two clusters tagged by a clustering algorithm (where class labels are not provided), are from the same underlying distribution. Although both are challenging questions for the high-dimensional, low-sample size data, there has been some recent development for both. However, when it is costly to manually place labels on observations, it is often that only a small portion of the class labels is available. In this article, we propose a significance analysis approach for such type of data, namely partially labeled data. Our method makes use of the whole data and tries to test the class difference as if all the labels were observed. Compared to a testing method that ignores the label information, our method provides a greater power, meanwhile, maintaining the size, illustrated by a comprehensive simulation study. Theoretical properties of the proposed method are studied with emphasis on the high-dimensional, low-sample size setting. Our simulated examples help to understand when and how the information extracted from the labeled data can be effective. A real data example further illustrates the usefulness of the proposed method.
Recommendations
- Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data
- Significance testing in clustering
- Robust centroid based classification with minimum error rates for high dimension, low sample size data
- Partition clustering of high dimensional low sample size data based on \(p\)-values
- A two-sample test for high-dimensional data with applications to gene-set testing
Cites work
- scientific article; zbMATH DE number 5957483 (Why is no real title available?)
- scientific article; zbMATH DE number 1332320 (Why is no real title available?)
- scientific article; zbMATH DE number 823069 (Why is no real title available?)
- scientific article; zbMATH DE number 889593 (Why is no real title available?)
- A Significance Test for the Separation of Two Highly Multivariate Small Samples
- A direct approach to sparse discriminant analysis in ultra-high dimensions
- A test for the mean vector with fewer observations than the dimension
- A two-sample test for high-dimensional data with applications to gene-set testing
- Boundary behavior in high dimension, low sample size asymptotics of PCA
- Distance-Weighted Discrimination
- Distance-weighted support vector machine
- Flexible high-dimensional classification machines and their asymptotic properties
- Geometric Representation of High Dimension, Low Sample Size Data
- Multivariate Theory for Analyzing High Dimensional Data
- Multivariate analysis of variance with fewer observations than the dimension
- On efficient large margin semisupervised learning: method and theory
- On transductive support vector machines
- PCA consistency in high dimension, low sample size context
- Some high-dimensional tests for a one-way MANOVA
- Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data
- Support-vector networks
- The high-dimension, low-sample-size geometric representation holds under mild conditions
- Weighted distance weighted discrimination and its asymptotic properties
Cited in
(2)
This page was built for publication: Significance analysis of high-dimensional, low-sample size partially labeled data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q286481)