A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification (Q308795): Difference between revisions

From MaRDI portal
Importer (talk | contribs)
Created a new Item
 
Created claim: DBLP publication ID (P1635): journals/cmmm/PamukcuBC15, #quickstatements; #temporary_batch_1731543907597
 
(7 intermediate revisions by 6 users not shown)
Property / review text
 
Summary: Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions.
Property / review text: Summary: Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. / rank
 
Normal rank
Property / Mathematics Subject Classification ID
 
Property / Mathematics Subject Classification ID: 92B15 / rank
 
Normal rank
Property / Mathematics Subject Classification ID
 
Property / Mathematics Subject Classification ID: 62P10 / rank
 
Normal rank
Property / Mathematics Subject Classification ID
 
Property / Mathematics Subject Classification ID: 62-07 / rank
 
Normal rank
Property / zbMATH DE Number
 
Property / zbMATH DE Number: 6623996 / rank
 
Normal rank
Property / zbMATH Keywords
 
principal component analysis
Property / zbMATH Keywords: principal component analysis / rank
 
Normal rank
Property / zbMATH Keywords
 
maximum entropy covariance matrix
Property / zbMATH Keywords: maximum entropy covariance matrix / rank
 
Normal rank
Property / zbMATH Keywords
 
hybridized smoothed covariance estimators
Property / zbMATH Keywords: hybridized smoothed covariance estimators / rank
 
Normal rank
Property / zbMATH Keywords
 
Akaike's information criterion
Property / zbMATH Keywords: Akaike's information criterion / rank
 
Normal rank
Property / Wikidata QID
 
Property / Wikidata QID: Q35594708 / rank
 
Normal rank
Property / describes a project that uses
 
Property / describes a project that uses: boost / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1155/2015/370640 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2036324035 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Principal component analysis. / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the maximum-entropy approach to undersized samples / rank
 
Normal rank
Property / cites work
 
Property / cites work: Probabilistic Principal Component Analysis / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4769776 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3286740 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3185327 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2974127 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Empirical Bayes estimation of the multivariate normal covariance matrix / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3878557 / rank
 
Normal rank
Property / cites work
 
Property / cites work: A well-conditioned estimator for large-dimensional covariance matrices / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5632131 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Informational complexity criteria for regression models. / rank
 
Normal rank
Property / cites work
 
Property / cites work: Akaike's information criterion and recent developments in information complexity / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4139463 / rank
 
Normal rank
Property / DBLP publication ID
 
Property / DBLP publication ID: journals/cmmm/PamukcuBC15 / rank
 
Normal rank
links / mardi / namelinks / mardi / name
 

Latest revision as of 01:26, 14 November 2024

scientific article
Language Label Description Also known as
English
A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification
scientific article

    Statements

    A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification (English)
    0 references
    0 references
    0 references
    0 references
    6 September 2016
    0 references
    Summary: Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions.
    0 references
    principal component analysis
    0 references
    maximum entropy covariance matrix
    0 references
    hybridized smoothed covariance estimators
    0 references
    Akaike's information criterion
    0 references

    Identifiers