seeker
Simplified Fetching and Processing of Microarray and RNA-Seq Data
Josh Schoenbachler, Jake Hughey
Last update: 22 January 2024
Copyright license: MIT license, File License
Software version identifier: 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.0.9, 1.0.10, 1.0.13, 1.1.3, 1.1.4, 1.1.5
Wrapper around various existing tools and command-line interfaces, providing a standard interface, simple parallelization, and detailed logging. For microarray data, maps probe sets to standard gene IDs, building on 'GEOquery' Davis and Meltzer (2007) <doi:10.1093/bioinformatics/btm254>, 'ArrayExpress' Kauffmann et al. (2009) <doi:10.1093/bioinformatics/btp354>, Robust multi-array average 'RMA' Irizarry et al. (2003) <doi:10.1093/biostatistics/4.2.249>, and 'BrainArray' Dai et al. (2005) <doi:10.1093/nar/gni179>. For RNA-seq data, fetches metadata and raw reads from National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), performs standard adapter and quality trimming using 'TrimGalore' Krueger <https://github.com/FelixKrueger/TrimGalore>, performs quality control checks using 'FastQC' Andrews <https://github.com/s-andrews/FastQC>, quantifies transcript abundances using 'salmon' Patro et al. (2017) <doi:10.1038/nmeth.4197> and potentially 'refgenie' Stolarczyk et al. (2020) <doi:10.1093/gigascience/giz149>, aggregates the results using 'MultiQC' Ewels et al. (2016) <doi:10.1093/bioinformatics/btw354>, maps transcripts to genes using 'biomaRt' Durinkck et al. (2009) <doi:10.1038/nprot.2009.97>, and summarizes transcript-level quantifications for gene-level analyses using 'tximport' Soneson et al. (2015) <doi:10.12688/f1000research.7563.2>.
- GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor
- Importing ArrayExpress datasets into R/Bioconductor
- Exploration, normalization, and summaries of high density oligonucleotide array probe level data
- Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data
- Salmon provides fast and bias-aware quantification of transcript expression
- Refgenie: a reference genome resource manager
- MultiQC: summarize analysis results for multiple tools and samples in a single report
- Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt
- Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences