Statistical methods for microarray data analysis. Methods and protocols (Q2259129)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Statistical methods for microarray data analysis. Methods and protocols
scientific article

    Statements

    Statistical methods for microarray data analysis. Methods and protocols (English)
    0 references
    27 February 2015
    0 references
    This book covers a broad range of topics, from the normalization of expression levels to the evaluation of experimental noise or the identification of putative networks through either multivariate analysis approach or clustering. It is organized in eleven chapters and it commences with a concise overview of the biological details that could be relevant to researchers analysing microarray data, but do not have a biology background. By discussing briefly the pre-, during and post-hybridization procedures, as well as introducing the basics of data processing, the authors set both the biological and computational context for the analysis. The second chapter offers more information on concepts from molecular biology and delves deeper into the technological principles behind microarrays and the characteristics of the different types of microarrays that are available. The chapter concludes with an introduction to data processing, overviewing the steps for data calibration, statistical analysis and the challenges that accompany the vast amount of information available from such an experiment. The third chapter is focused on multiple hypothesis testing. Commencing with the formal definitions, an overview of the single-step, step-down and step-up approaches and a discussion on the interpretation of \(p\)-values, the author presents next the FWER approach, which consists of a series of refinements of the Bonferroni procedure. The following sections discuss the false discovery rate and multiple testing procedure based on mixture densities. The fourth chapter is built on a new approach to identify differentially expressed genes, the \(\delta\)-sequence method. First, the non-parametric empirical Bayes methodology (NEBM) is introduced, followed by a detailed description of its steps. The construction of the \(\delta\)-sequence is further discussed and the chapter concludes with the analysis of the performance of the newly proposed method. In the fifth chapter, the author discusses a normalization of expression levels based on the \(\delta\)-sequence introduced earlier. Following a brief overview of standard normalization methods, like the quantile and the global normalization, the author presents in detail the \(\delta\)-sequence based normalization. Results on simulated data conclude the chapter. In the sixth chapter, the author focuses on the analysis of multivariate gene signatures for breast cancer censored survival data. Commencing with the description of the stagewise forward search with shrinkage information, he then presents extensive simulation results. The authors of the seventh chapter describe in detail the principles of clustering with application to gene expression and the role of normal mixture models. Two approaches are further discussed: the clustering of tissues and the clustering of gene profiles (for both, detailed examples are presented). The eighth chapter changes the focus to the identification of putative networks using multivariate gene expression data. Following a description of the mathematical background (including the discrete Markov random field model and the estimation of model parameters and emission and prior probabilities), the authors discuss at large the applicability of the approach on a simulation study. The chapter concludes with an example on a time course gene expression study of TrkA and TrkB transfected neuroblastoma cell lines. In the ninth chapter, the author presents an approach for the detection of outliers in high throughput data analysis. First, the statistical problem and general methodology are introduced, together with the formulation of the two group problem and its generalization to the multivariate case, when more than two groups are present. Next, the different methods in use are evaluated on simulation studies. The tenth chapter focuses on the effect of noise and annotation imprecision on the quality of the interpretation of microarray experiments. Following an overview of the technical aspects of data quality (including various biases and RNA quality), the authors present a set of standards for an ideal microarray analysis. Next, the issues discussed at large are the problematic annotations and data standardization. In the eleventh chapter the authors discuss the effect of aggregated expression intensities results on microarray analysis. Alternatives such as the \(\rho(X,Y)\) representation are also discussed. Next, the noise effect is included and the effects of the law of large numbers and random summation are presented. The book concludes with an overview of tests for assessing normality of the gene expression distribution. The test procedures and the results on the hyperdip and tell.hyperdip data are extensively discussed. Although written in an accessible manner, the book requires an a priori understanding of machine learning and statistical approaches. It is therefore appropriate for research students and post-docs as well as lecturers looking for handson examples.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    microarray
    0 references
    multiple hypothesis testing
    0 references
    gene selection, \(\delta\)-sequence method
    0 references
    normalization
    0 references
    censored survival data
    0 references
    clustering
    0 references
    network analysis
    0 references
    noise
    0 references
    aggregation effect
    0 references
    normality
    0 references
    0 references