RNA sequence, structure, and function: computational and bioinformatic methods (Q457828)

From MaRDI portal





scientific article
Language Label Description Also known as
English
RNA sequence, structure, and function: computational and bioinformatic methods
scientific article

    Statements

    RNA sequence, structure, and function: computational and bioinformatic methods (English)
    0 references
    29 September 2014
    0 references
    This book is a timely addition that describes methods and approaches developed for cutting edge topics related to non-coding RNAs (ncRNAs). The book is structured in 23 chapters and contains reviews on different research directions such as the principles of RNA structure architecture, methods to predict RNA structure based on the consensus of multiple sequence alignments or on stochastic context free grammars (SCFG), motif discovery, phylogenetic analysis or prediction of RNA-RNA or RNA-protein interactions. The first chapter reviews some biological and computational concepts linked to the 2D and 3D RNA structure, RNA-based interactions and some classes of small RNAs (such as small interfering RNAs and microRNAs). The chapter commences with commonly used representations of RNA secondary structure, and introduces the Nussinov algorithm for RNA folding. It continues with mutual-information-based approaches for inferring the secondary structure based on multiple alignments. The chapter concludes with an overview of open problems in RNA bioinformatics and a list of benchmarks used for the different prediction tasks. The second chapter focuses on the properties of the RNA, from enzymatic to informational, including a brief description of the RNA itself and the principles of RNA folding. The chapter concludes with a brief overview of RNA motifs and RNA ligands. The third chapter describes an approach for determining the secondary structure using a set of nearest neighbour parameters, determined through melting experiments. The chapter commences with an overview of the method, showing how these parameters can be used to quantify the stability of the resulting structure. The second section presents the experimental steps such as the two state approximation, the fitting of the curves for non-self-complementary bimolecular folding and the calculation of the melting temperature. The authors also discuss the accepted assumptions on baselines and heat capacity changes. Next, the nearest neighbour rules are presented in conjunction with structure-informed parameter estimation methods. The chapter concludes with a comparison of several versions of the proposed method and a brief description of available parameter collections, including experimentally determined RNA parameters and pseudoknots parameters. The fourth chapter describes methods for predicting RNA secondary structure using energy-based calculations, namely dynamic programming approaches aimed at determining optimal and suboptimal structures, which deal with the inaccuracies of the minimum free energy (MFE) approach. The chapter commences with dynamic programming concepts applied to the folding problem and approaches to check the prediction accuracy using known structures of the ribosomal subunits. Next, the partition function is introduced and reliability measures and visualizations are presented. The chapter continues with a description of suboptimal foldings and alternative approaches to the MFE criterion and concludes with notions on RNA folding kinetics such as co-transcriptional folding and coarse grain calculation of folding dynamics. The fifth chapter introduces stochastic context free grammars (SCFGs) and provides the theoretical background for their use in RNA secondary structure analyses. Following a description of the concepts of languages, derivations, normal forms, parsing and ambiguity, the author introduces the rules for SCFGs and a method to connect these with the terminology of hidden Markov models (HMMs). Next, the analysis of RNA secondary structure is discussed in the SCFG context with a particular focus on how SCFG parses can indicate the existence of secondary structure, and how to deal with semantic ambiguity and grammar design and its trade-offs. The next chapter presents the history and terminology behind databases like: Noncode, miRbase, Rfam and SILVA. The curation of the entries submitted to these databases, the pros and cons of manual versus automated annotations and the use of sequence versus alignment databases are discussed. Next, distinctive features of each of the four databases are presented, e.g., the Noncode features the ``process function classification'' (PFC) and the Boolean search engine that enables an efficient data mining, the miRBase contains all known miRNAs, Rfam is the largest general alignment database currently available and SILVA is mainly used for the correct identification of rRNA. The seventh chapter reviews methods of predicting the secondary structure using the energy based consensus and multiple sequence alignments. It commences with an analysis of conserved RNA structures in functional RNAs and continues with an in depth analysis of the RNAalifold algorithm. The authors describe in detail the averaged energy minimization for multiple alignments and the covariance score and its improved version based on RIBOSUM matrices. A step-by-step guide for RNAalifold is also included. The chapter concludes with an overview of alternative methods and new approaches. The eighth chapter is structured as a hands-on description of the use of SCFG for the prediction of RNA secondary structure. Commencing with a description of SCFGs, the chapter continues with a series of algorithms such as the basic CYK (Cocke-Younger-Kasami), the highest probability parse and the inside-outside algorithm. The discussion includes an objective comparison between SCFGs and the thermodynamic models (described in previous chapters) and a description of pfold -- an extension of SCFG models using phylogenies. The ninth chapter focuses on the role of Infernal and covariance models (CMs), extensively used for Rfam, for annotating functional RNAs of an archeon, \textit{Methanobrenbacter ruminontium}, presented as example. First, family-specific RNA search methods are presented, followed by a detailed description of probabilistic CMs. Next, Infernal is introduced and a step-by-step guide for its usage is presented; this is followed by a list of considerations/warnings regarding its predictions. The chapter concludes with a comparison of Infernal to other family-specific methods. In the tenth chapter, the author narrows the prediction of ncRNAs to specific classes such as tRNAs, snoRNAs or miRNAs, characterized by specific sequence motifs and structural features, and describes in detail a three tiered approach including a sensitivity filter, a specificity selection and a prediction scoring. The chapter commences with the description of tRNAs and the tools developed for their prediction. Next, the biological characteristics and the tools developed for the identification and target prediction of rRNA and snoRNAs are described. The author concludes with an overview of newly discovered RNA classes such as the Y RNAs and vault RNAs, also characterized by a specific secondary structure. The eleventh chapter introduces a new perspective for determining the RNA structure: the abstract shape analysis of RNA, which investigates the complete Boltzman ensemble of the secondary structures of an RNA molecule. The chapter starts with a description of this approach in the context of MFE folding. Next, the mathematical framework is presented in detail, with emphasis on the thermodynamic aspects, the properties of shape classes and the computational tasks. Next, the representations of RNA structures are thoroughly reviewed. The chapter continues with computational methods for the identification and characterization of the representative structures for which numerous examples are presented and discussed. The chapter concludes with a description of RNA shapes package and the related software. Continuing the comparative analysis of structures, the twelfth chapter uses pseudoknots for the comparison of structures formed on the same or different sequences. The authors discuss the first distances used for the comparison of structures on the same sequence such as the base pair distance or the Hausdorff distance and continue with approaches for different sequences. The tree edit model and distance are discussed in detail. The chapter concludes with pseudoknots examples and a brief description of approaches using alignments or alternative encodings, representing the next frontier of algorithmic challenge. The thirteenth chapter commences with a description of RNA structural alignments approaches and starts with Sankoff-based ones. First, the principles are stated and the pairwise structural alignment algorithm is used as an illustrative example. Following a brief description of multiple structural alignment approaches, the authors present the implementation details, discussing heuristics developed for this problem. The fourteenth chapter focuses on non-Sankoff approaches for structural alignments and describes how to extract candidates from local structures, how to adjust the scoring function and how to check the consistency of alignments and structures. The chapter also contains a list of software tools built on non-Sankoff approach. In the fifteenth chapter the authors discuss the de novo discovery of motifs (commonalities in structure) that could provide additional description of ncRNAs. Following a description of the CM finder discovery pathway, the authors present the CM finder algorithm in detail: the heuristic alignment, the model inference, the realignment and motif scoring. The chapter concludes with examples from Rfam and modENCODE data. The sixteenth chapter focuses on the analysis of the phylogeny of RNA structure and its use for inferring evolution. First, Darwin's evolutionary theory (the selection of populations) is transposed in mathematical language and basic, and introductory notions of phylogenetic trees are overviewed. Next, the phylogenetic models are introduced and the nucleotide substitution problem is discussed at length, including examples. The design of RNA structures from an evolution perspective is then presented, with emphasis on the chemical kinetics of evolution. The advantages and disadvantages of the quasi-species concept and the transition from sequences and structures to genotypes and phenotypes is also discussed. The third section covers applications such as the inference of phylogenetic trees, the phylogenetic background models for genomic screens and the evolution of viruses and bacteria. The seventeenth chapter presents the biology-based editing of RNA structural alignments as an art which requires extensive knowledge and creativity in equal amounts. The authors present first editors, algorithms and tools for RNA structural alignments and the basic principles that govern them. Next, using examples, some of the tools are presented in detail, e.g., SARSE-semi automated RNA structure editor. The eighteenth chapter introduces the 3D RNA structure proposing a pipeline that combines characteristics of ParAlign and Infernal. First, the concepts are overviewed and two main types of modelling described in detail: the template-based and template-free, including lists of available software. For the first class, the satisfaction of spatial restrains and the template search are discussed in detail. For the latter, the authors focus on the local and global refinement of the models. The nineteenth chapter approaches a new aspect of RNA characterization: its interaction with other RNA molecules, i.e., the targeting of ncRNA in a regulatory manner of other RNAs. Following an overview of general principles of this type of interaction, the authors overview prediction approaches which neglect the intramolecular structures. Next, the approaches which consider these structures are discussed including accessibility-based approaches and those based on general joint structures. The chapter concludes with a comparative analysis of these methods. The twentieth chapter presents another facet of RNA structure prediction, namely the prediction of microRNAs, a special class of ncRNAs. Following a brief description of their biogenesis and an overview of publicly available databases, the authors transpose the biological requirements into computational terms and introduce a de novo miRNA prediction. Next, other tools based on theoretical characteristics available for miRNA prediction are described and the authors conclude with an overview of tools based on sequencing results, RNAseq. The twenty-first chapter discusses the identification of miRNA targets based on comparative genomics, using the animal miRNAs as examples. Due to the limited length of the seed region required for the translation repression, the number of potential targets exceeds the validation capability of biological wet labs. Therefore the proposed approach is based on the conservation of the 3' untranslated region (3' UTR), of the miRNA, and of the interaction between miRNA and its target. A conservation metric based on kmer conservation is proposed and the tool PHYLIP based on phylogenetic analysis is presented. The twenty-second chapter discusses the design of non-miRNA ncRNAs, called small interfering RNAs (siRNAs). Following a brief biology background, the author presents siRNA-based design and machine learning approaches for the optimization of this process. The chapter concludes with tools based on accessibility-aided siRNA design. The book concludes with the twenty-third chapter focusing on RNA-protein interactions. First, the authors present the functional roles of RNA binding proteins and their contribution to mRNA translation or its degradation, to mRNA editing or to mRNA stability. Next, the most abundant RNA binding domains are overviewed together with the experimental method for detecting and quantifying these interactions. The chapter concludes with an overview of computational methods for predicting these binding sites. This book on RNA structure and function is suitable for both graduate and post-graduate students as well as established researchers. The concepts are presented clearly and with plenty of examples to facilitate the understanding without requiring extensive prior knowledge; the book is also a valuable review of the state of the art of hot topics on cutting edge directions of research. In addition, the value of each chapter is enriched by an extensive set of references of recent yet established papers, offering a reliable starting point for literature on each research topic.
    0 references
    RNA structure
    0 references
    RNA folding
    0 references
    non-coding RNA
    0 references
    stochastic context free grammars
    0 references
    multiple sequence alignments
    0 references
    RNA motif
    0 references
    phylogeny of RNA structure
    0 references
    microRNA
    0 references
    siRNA design
    0 references
    RNA-protein interaction
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references