RNA bioinformatics (Q2259130)

This text is a timely book that overviews the state of art of methodological approaches for the analysis of RNA data, from the prediction of its secondary or tertiary structure to the parsing of massive RNAseq data. The book is structured in three parts. The first presents approaches for the prediction and understanding of RNA structures. The second is focused on the analysis of high throughput RNA sequencing data and the third presents web resources for RNA data analysis. The first part commences with an overview of free energy minimization approaches for the prediction of secondary RNA structure such as RNAfold and the McCaskill method. Next, the authors introduce certain methodological details required for predicting the structures, the RNA deleterious mutations and their use for computational RNA design. The second chapter focuses on an alternative approach for predicting secondary structures, one based on multiple alignments. Starting with an example which highlights the differences between the two methods, the author continues with an overview of commonly used tools for structure prediction from aligned sequences. Next, methods based on sequence covariation of base pairs and the use of phylogenetic (evolutionary) information are presented in detail. The use of MEG estimators, the choice of probabilistic models \(p(\theta|A)\) and of the gain functions \(G(\theta,y)\) are also discussed. The chapter concludes with the formulation of mathematically related problems of predicting RNA-RNA interactions and determining the common joint structure of the two aligned RNA sequences. In the third chapter, a simple protocol for the inference of RNA global pairwise alignments is proposed. Following a brief description of the computational methods which will be used and the corresponding test datasets, the authors proceed with the comparison of the performance of the algorithms and comment the results using examples. The fourth chapter focuses on de novo motif discovery in secondary structures using RNAProfile. The authors describe in detail the algorithm which consists in the selection of candidate regions and, using a heuristic approach, the identification of motifs. An example run of the tool is also included together with a description of the parameters which are involved. The chapter concludes with several examples. The fifth chapter focuses on the drawing and editing of RNA secondary structures. Commencing with an outline of objectives for RNA visualization and an overview of existing tools, the authors present in detail the commonly used file formats for RNA secondary structures and their typical representations: the linear layout, the circular one, the squiggle plots and the tree layout. The authors also discuss the 3D representation of pseudoknots and interactions. In the sixth chapter, the authors discuss the prediction and modelling of RNA 3D structures. Starting with the classification of base pairing interactions, the authors focus next on the role of RNA motifs and the steps for the prediction of RNA tertiary structure including the prediction of a secondary structure scaffold and the use of an interaction graph with motif insertion. The chapter concludes with an overview of methods of reconstructing tertiary structures from base pairing interaction networks. The last chapter in the first part focuses on the fast prediction of RNA-RNA interactions using a heuristic approach. Following the description of the test dataset, the author describes in detail the algorithm for RNA secondary structure prediction, the RNA-RNA interaction prediction step and the parallelization of the approach to optimize the run time. The second part of the book addresses the analysis of high throughput RNA sequencing data. The eighth chapter commences with an overview of quality check approaches. Starting with the sequencing quality derived from the fastq data, the nucleotide composition and the effect of the PCR amplification are introduced next. The authors also discuss the tRNA/rRNA contamination, the saturation test of sequencing depth, the reproducibility between replicates, the coverage uniformity and the distribution of reads into the various annotation classes. In the ninth chapter, the author reviews the mapping of RNAseq data, addressing both the hardware and software limitations and features. The SpliceMap approach is presented in detail, including the half read mapping, the seed selection and extension and the junction search. The detection of junctions from junction read alignments is also discussed. In chapter ten, the authors discuss the transcriptome quantification using RNAseq data, focusing on a pipeline based on a series of \texttt{R} packages for the detection of differentially expressed mRNAs and miRNAs. The authors present in detail characteristics and parameters for the required \texttt{R} libraries. Steps such as sequence alignment, post-processing and detection of differentially expressed transcripts using DESeq are also included. The eleventh chapter presents yet another facet of RNAseq, the transcriptome assembly and the analysis of alternative splicing, focusing on the PIntron approach. First, the authors describe the installation requirements and the input data. Next, the pipeline and execution details such as parameter ranges and requirements are extensively reviewed. The various output formats, including GTF file and JSON, are also discussed. In the twelfth chapter, the authors review a straightforward (easy and reproducible) computational protocol for the detection of post-transcriptional RNA editing events exemplified on human RNAseq and DNAseq. The authors commence with an overview of hardware and software prerequisites and the steps for the installation of REDItools. Next, the method is described, including advice on the mapping of RNAseq reads, the SAM to BAM conversion, the BLAT correction (which is optional) and the detection of RNA editing candidates using matches DNAseq and RNAseq. The thirteenth chapter gives an extensive review of the prediction of miRNA targets. Starting with an overview of the state of the art, the authors continue with the biological background required for the prediction of targets. The features of miRNA-mRNA interactions and the way in which these are incorporated into prediction algorithms are discussed in detail. An outline of miRNA on-line databases and prediction tools is also included. The chapter concludes with miRNA functional annotation tools, using the information provided in high throughput experiments. In the fourteenth chapter, the authors continue the discussion on miRNAs, focusing on using deep sequencing data for the identification of editing sites in mature miRNAs. Following a description of initial steps such as the filtering of low quality reads and the trimming of adapters, of genome alignment and mis-match mapping to miRNA precursors, the authors focus on the discussion of how binomial statistics can be used to identify and remove sequencing errors and an approach to remove SNPs from the list of statistically significant modifications. In chapter fifteen, the authors present an automatic analysis workflow for RNAseq data: NGS-Trex, with examples on human mRNAseq data. Following an overview of the pipeline, the individual steps are presented next. These include the data submission, the pre-processing, mapping and annotation of sequences. Data mining and statistical approaches included in this workflow are also included. In the sixteenth chapter, the authors present a method to link NGS raw data to taxonomic profiling using e-DNA meta-barcoding, exemplified on the human microbiome. The SFF tools used for this analysis are described in detail. A particular focus is offered to the denoising procedure and the taxonomic classification. The next chapter, seventeenth, presents methods for the deciphering of meta-transcriptomic data. It includes a detailed description of the SortMeRNA tool, its input and output and the required parameters. The examples are focused on the classification of rRNA. In the last chapter of this part the authors overview a sequencing method to determine RNA-protein interactions, RIPseq; its variants are also discussed. First, the authors overview the bioinformatics analysis of the sequencing data, presenting in detail the mapping to the reference genome, the use of read counts and the similarities with RNAseq or ChipSeq. In terms of tools, RIPSeeker and PARalyzer are presented. In the third part of this book, web resources for RNA data analysis are presented. Chapter nineteen describes the Vienna RNA Web services with applications in the analysis of non-coding RNA by producing secondary structures and providing efficient sequence design that takes into account RNA-RNA hybridizations. Following an overview of hardware and software requirements and accepted formats for the input data, the authors describe in detail the prediction of the secondary structure using the RNAfold webserver, of consensus secondary structures using RNAalifold and RNA-RNA interactions using RNAcofold and RNAup. The chapter concludes with an overview of tools for RNA design (such as RNAinverse webserver) and the analysis of folding kinetics with the barriers server. In the twentieth chapter, the authors present ExpEdit, a tool for exploring the RNA editing potential of RNAseq data. Following a description of the technical requirements, the authors proceed with the detailed description of the method followed by a hands-on example using ExpEdit GUI. The twenty-first chapter is built as a guideline for the annotation of 5' and 3' untranslated regions (UTRs) and their cis-regulatory regions in the UTRsite collection. The authors discuss how to detect and extract relevant orthologous sequences, how to infer their secondary structure and how to generate and use MotifPattern. The twenty-second chapter describes RFAM, which is a collection of non-coding RNA sequences. Following a discussion about organising the transcripts into families and clans, the authors present in detail the approach for sequence searching and the steps for retrieving family information. The chapter concludes with suggestions about the browsing of the database or its download form the ftp site. In the twenty-third chapter, the authors present how to use ASPIcDB, a database for alternative splicing analysis. The authors discuss all the steps of a typical analysis and include suggestions and screenshots for every step. Another tool for alternative splicing is presented in the twenty-fourth chapter. The authors analyse the alternative splicing events in custom gene datasets using the AStalavista tool. The chapter commences with details on how to install the tool and retrieve RNAseq alignments. Next, the authors present in detail the method and provide an example using GENCODE. In the last chapter, the authors discuss the computational design of artificial RNA molecules with applications to gene regulation. Following a state of the art description of the topic, the authors discuss the components of the design of functional sRNAs and present an overview of available on-line resources. The chapter concludes with an application of this approach for the design of antagomirs and effective miRNA sponges. Although written in an accessible manner, the book requires an extensive background in bioinformatics and biology. Nonetheless, the authors often provide sufficient description of the fields and state of the art, combined with numerous references. The major advantage of the book is that it encompasses elements that attract both bioinformaticians seeking an applicability of their proposed methods and biologists seeking for answers to particular biological questions.

0 references

reviewed by

Irina Ioana Mohorianu

0 references

zbMATH Keywords

RNA secondary structure

0 references

RNA tertiary structure

0 references

multiple alignment

0 references

motif discovery