Deep sequencing data analysis (Q2034478)
From MaRDI portal
![]() | This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Deep sequencing data analysis |
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Deep sequencing data analysis |
scientific article |
Statements
Deep sequencing data analysis (English)
0 references
22 June 2021
0 references
The book is structured in 19 chapters and covers a wide variety of topics focused on sequencing; the collection is timely, given the exponential expansion of high throughput sequencing and the significant advances that were made possible based on hypotheses extracted from such data. The diversity of topics also recommends the book as an ideal starting place for a broader overview of the field. The book commences with a case study of whole genome sequencing. The authors present approaches for detecting causal variants in Mendelian disorders and include a detailed overview on variant detection workflow and standard variation types; also included are a series of variant annotation algorithms and tips for reporting and interpreting the outputs. The second chapter continues with yet another angle for this topic, and puts forward statistical consideration for inferring copy number variations. The author complements the computational approaches with statistical model-based approaches including hidden Markov models, shifting level model and the change point model. The third chapter changes gears, and focuses on single cell data; an essential step of the analysis is the clustering of cells (and subsequent identification/ prediction of cell types). Following an overview of differences observed between bulk and signal cell sequencing, the authors focus on individual algorithms, providing details on both the methods and the statistical measures for comparing the algorithms. In the next chapter (Chapter 4) the authors address the essential question of comparing datasets, usually against a public repositories; while the size of these public datasets increases rapidly, the need for a systematic way for comparing expression grows higher. The authors overview some public resources (such as TGCA, GTEx, CCLE and others); next they focus on the analysis pipeline and the required steps for e.g. batch correction or differential expression analysis. In Chapter 5 the author focuses on the optimisation of the identification of hitting sets (set of common string patterns); the DOCKS (design of compact k-mer sets) tool is presented in detail in an intertwined approach between theory and examples. In the next chapter, the focus shifts again on whole meta-genome shotgun sequencing; approaches for assembly, community and functional profiling are discussed in detail. The trend is continued in Chapter 7, that overviews another approach for microbiome analysis, the 16S amplicon sequencing. The steps of experimental processing (including the design and sample processing) are followed by the steps of the bioinformatics pipeline (comprising demultiplexing, denoising, identification of contaminating transcripts, generating the phylogenetic tree and the rarefaction of data). Chapter 8 focuses on RNAseq experiments on non-model organisms; while sharing some of the hurdles presented for previous case studies, such as optimising the sequencing design, de novo assembly of transcripts, QC filtering of reads, this task also raises specific obstacles on the alignment, expression summary, differential expression analysis and interpretation (enrichment analyses); all of these steps are discussed in detail, and accompanied by numerous references and comparative reviews. Chapter 9 links in deep learning (DL) approaches into the detection of cancer using microbiome sequencing input. An overview of applications of DL in genomics precedes a case study for which the comments on the data loading and manipulation, and of the training efficiency and speed complement the interpretation of the results. Chapter 10 represents a change in modality -- the landscape of accessible chromatin, assessed using ATAC-seq experiments, can be evaluated using the workflow proposed in this chapter; all technical details, from required packages to genomic sequences, and corresponding annotations are included. All steps of the pipeline, and assessment approaches are discussed in detail. In the next chapter (Chapter 11) an approach for genome-wise non-invasive prenatal diagnosis is proposed; this is based on cfDNA analysis using \textit{Hoobari}. The computational pipeline for variant calling is described in detail, the identification of mutation loci and a Bayesian algorithm for variant calling are also discussed. Another application of \textit{Hoobari} is presented in chapter 12; the close proximity of two applications strengthens the description of the algorithm Chapter 13 presents a GWAS approach that promises and accutate imputation of untyped variants. The chapter commences with an overview of imputation methods followed by approaches to estimate the resulting accuracies. In Chapter 14, the authors focus on a multi-region sequence analysis approach for predicting heterogeneity and clonal evolution; the inference pf the phylogenetic tree, the stability to subsampling and other properties are discussed in detail, and are emphasise by hands-on examples. Chapter 15 presents yet another angle, that of overcoming the interpretability difficulties for cancer classification tasks (the CAM methodology is described at length). Reverting back to transcriptome profiling, in Chapter 16 the authors present standard pipeline for the analysis of single cell data; the numerous examples, and plots make this an accessible introduction to the topic. Chapter 17 addresses the experimental design from a biological perspective; the sample variability, the sample size and the importance of randomisation and batches are discussed in detail. Chapter 18 focuses on non-coding RNAs (microRNAs) in a single cell setting; the characteristics of single cell resolution for this type of assay are illustrated on the Wang et al dataset. The book concludes with a chapter on data collection and analysis in the forensic arena; the structure and applications of national DNA databases, as well as the privacy DNA rights in the USA are discussed. One distinctive advantage of this collection of chapters is the wealth of information that is provided, both as details and example case studies and also the extensive, and well curated set of references. While textbooks on this topic are yet to be written, this book is an excellent initial overview that balances well the depth of information with an essential broad summary.
0 references
deep sequencing
0 references
causal variants
0 references
Mendelian disorder
0 references
copy number variation
0 references
community detection algorithms
0 references
small universal k-mer hitting set
0 references
whole-metagenome shotgun sequencing
0 references
microbiome sequencing
0 references
16S amplicon sequencing
0 references
deep learning
0 references
chromatin landscape
0 references
ATACseq
0 references
non-invasive prenatal diagnosis
0 references
single nucleotide polymorphism (SNP)
0 references
de novo mutations
0 references
imputation of untyped variants
0 references
multi-region sequence analysis
0 references
intra-tumour heterogeneity
0 references
clonal evolution
0 references
cancer classification
0 references
single cell transcriptome profiling
0 references
RNA-seq experimental design
0 references
single cell microRNA analysis
0 references
forensic analysis
0 references