Multiple suboptimal solutions for prediction rules in gene expression data (Q382664): Difference between revisions

Summary: This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.

0 references

describes a project that uses

ElemStatLearn

0 references

PoiClaClu

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1155/2013/798189

0 references

cites work

Q4864293

0 references

Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ <sup>1</sup> minimization

0 references

Stable signal recovery from incomplete and inaccurate measurements

0 references

Exploration, normalization, and summaries of high density oligonucleotide array probe level data

0 references

Gene selection for cancer classification using support vector machines

0 references

Regularization and Variable Selection Via the Elastic Net

0 references

Q3093381

0 references

The elements of statistical learning. Data mining, inference, and prediction

0 references

Q4792072

0 references

A boosting method for maximization of the area under the ROC curve

0 references

On biological validity indices for soft clustering algorithms for gene expression data

0 references

Sparse and Redundant Representations

0 references

Classification and clustering of sequencing data using a Poisson model

0 references

Identifiers

zbMATH Open document ID

1275.92073

0 references

DOI

10.1155/2013/798189

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:382664

@@ Property / describes a project that uses @@
+PoiClaClu
@@ Property / describes a project that uses: PoiClaClu / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1155/2013/798189
+Normal rank
@@ Property / OpenAlex ID @@
+W2117073647
@@ Property / OpenAlex ID: W2117073647 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4864293
@@ Property / cites work: Q4864293 / rank @@
+Normal rank
@@ Property / cites work @@
+Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ             <sup>1</sup>             minimization
+Normal rank
@@ Property / cites work @@
+Stable signal recovery from incomplete and inaccurate measurements
+Normal rank
@@ Property / cites work @@
+Exploration, normalization, and summaries of high density oligonucleotide array probe level data
+Normal rank
@@ Property / cites work @@
+Gene selection for cancer classification using support vector machines
+Normal rank
@@ Property / cites work @@
+Regularization and Variable Selection Via the Elastic Net
+Normal rank
@@ Property / cites work @@
+Q3093381
@@ Property / cites work: Q3093381 / rank @@
+Normal rank
@@ Property / cites work @@
+The elements of statistical learning. Data mining, inference, and prediction
+Normal rank
@@ Property / cites work @@
+Q4792072
@@ Property / cites work: Q4792072 / rank @@
+Normal rank
@@ Property / cites work @@
+A boosting method for maximization of the area under the ROC curve
+Normal rank
@@ Property / cites work @@
+On biological validity indices for soft clustering algorithms for gene expression data
+Normal rank
@@ Property / cites work @@
+Sparse and Redundant Representations
@@ Property / cites work: Sparse and Redundant Representations / rank @@
+Normal rank
@@ Property / cites work @@
+Classification and clustering of sequencing data using a Poisson model
+Normal rank