Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

From MaRDI portal
Dataset:6694911



DOI10.5281/zenodo.10381745Zenodo10381745MaRDI QIDQ6694911FDOQ6694911

Dataset published at Zenodo repository.

Timothy C. Howton, Amanda D. Clark, Victoria L. Flanary, Emma F. Jones, Brittany N. Lasseigne

Publication date: 14 December 2023

Copyright license: MIT license



data_minus_bam.tar.gz contains all files from the data directory (except for bam outputs) associated with the 230227_EJ_MouseBrainIsoDiv GitHub project and includes the following: - comparison_gene_lists/: The RData in the following directory contains all comparison gene lists with DGE, DTE, and DTU for importing into the R environment and reproducing analyses. - all_comparison_gene_lists.Rdata - cpm_out/: The RData in the following directory contains the processed counts per million and formatted metadata for downstream analyses. - cpm_counts_metadata.RData - deseq2_data/: All files in the following directory are Rds files with deseq2 results for the study design indicated in the file name. If the file name includes gene it was done at the gene level and transcript indicates the analysis was done at the transcript level. If a filename includes two regions, it is a comparison between the two, a file name with one region denotes either one vs all or male vs female. Any filename that includes sex is male vs female in the indicated region(s). - all_regions_sex_gene_results.Rds - all_regions_sex_transcript_results.Rds - cerebellum_cortex_results.Rds - cerebellum_cortex_transcripts_results.Rds - cerebellum_gene_results.Rds - cerebellum_hippocampus_results.Rds - cerebellum_hippocampus_transcripts_results.Rds - cerebellum_sex_gene_results.Rds - cerebellum_sex_transcript_results.Rds - cerebellum_striatum_results.Rds - cerebellum_striatum_transcripts_results.Rds - cerebellum_transcript_results.Rds - cortex_gene_results.Rds - cortex_hippocampus_results.Rds - cortex_hippocampus_transcripts_results.Rds - cortex_sex_gene_results.Rds - cortex_sex_transcript_results.Rds - cortex_striatum_results.Rds - cortex_striatum_transcripts_results.Rds - cortex_transcript_results.Rds - hippocampus_gene_results.Rds - hippocampus_sex_gene_results.Rds - hippocampus_sex_transcript_results.Rds - hippocampus_striatum_transcripts_results.Rds - hippocampus_transcript_results.Rds - striatum_gene_results.Rds - striatum_hippocampus_results.Rds - striatum_hippocampus_transcripts_results.Rds - striatum_sex_gene_results.Rds - striatum_sex_transcript_results.Rds - striatum_transcript_results.Rds - gencode_annotations/: This directory contains the exact GENCODE genome and transcriptome annotations used for our analyses - GRCm39.primary_assembly.genome.fa - GRCm39.primary_assembly.genome.fa.fai - gencode.vM31.primary_assembly.annotation.gtf - gffread/: This directory contains the generated fasta files with exact isoform sequences for novel and annotated genes required for creating isoformSwitchAnalyzeR objects. - isoform_sequences.fa - isoform_sequences_linear.fa - nextflow/: All files in the following directories in the overarching nextflow are direct outputs from the nf-core nanoseq pipeline. For specific information on nanoseq pipeline outputs, please refer to https://nf-co.re/nanoseq/3.1.0/docs/output - bambu/ - counts_gene.txt - counts_transcript.txt - extended_annotations.gtf - extended_annotations.gtf.idx - versions.yml - fastqc/ - There are 2 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples: - sample01_R1_1_fastqc.html - sample01_R1_1_fastqc.zip - minimap2/ - bam/ This directory has been removed to save space, please contact us for more information. - bigBed/ - There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples: - sample01_R1.bigBed - bigWig/ - There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples: - sample01_R1.bigWig - genome/ - GRCm39.primary_assembly.genome.fa.mmi - samtools_stats/ - There are 3 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples: - sample01_R1.sorted.bam.flagstat - sample01_R1.sorted.bam.idxstats - sample01_R1.sorted.bam.stats - multiqc/ - multiqc_data/ - mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.txt - mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.txt - mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.txt - mqc_samtools-idxstats-xy-plot_1.txt - mqc_samtools_alignment_plot_1.txt - multiqc.log - multiqc_data.json - multiqc_general_stats.txt - multiqc_samtools_flagstat.txt - multiqc_samtools_idxstats.txt - multiqc_samtools_stats.txt - multiqc_sources.txt - multiqc_plots/ - pdf/ - mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.pdf - mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.pdf - mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.pdf - mqc_samtools-idxstats-xy-plot_1.pdf - mqc_samtools-idxstats-xy-plot_1_pc.pdf - mqc_samtools_alignment_plot_1.pdf - mqc_samtools_alignment_plot_1_pc.pdf - png/ - *The same multiqc plots as the pdf directory, but in png format* - svg/ - *The same multiqc plots as the pdf and png directory, but in svg format* - multiqc_report.html - versions.yml - nanoplot/ - fastq/ - Contains 40 directories for 40 samples, each containing 12 files obtained from running nanoplot with the nf-core nanoseq pipeline. Below is a representative example, but this repo contains 1 directory per sample: - sample01_R1/ - Dynamic_Histogram_Read_length.html - HistogramReadlength.png - LengthvsQualityScatterPlot_dot.png - LengthvsQualityScatterPlot_kde.png - LogTransformed_HistogramReadlength.png - NanoPlot-report.html - NanoPlot_20230413_1600.log - NanoPlot_20230413_2047.log - NanoStats.txt - Weighted_HistogramReadlength.png - Weighted_LogTransformed_HistogramReadlength.png - Yield_By_Length.png - pipeline_info/ - execution_report_2023-04-13_15-46-11.html - execution_timeline_2023-04-13_15-46-11.html - execution_trace_2023-04-13_10-59-24.txt - execution_trace_2023-04-13_15-46-11.txt - pipeline_dag_2023-04-13_15-46-11.svg - samplesheet.valid.csv - software_versions.yml - switchlist_fasta/: This directory contains the generated fasta files for amino acids and nucleotides for individual isoformSwitchAnalyzeR objects required for downstream analyses. - cerebellum_AA.fasta - cerebellum_nt.fasta - cerebellum_sex_AA.fasta - cerebellum_sex_nt.fasta - cortex_AA.fasta - cortex_nt.fasta - cortex_sex_AA.fasta - cortex_sex_nt.fasta - hippocampus_AA.fasta - hippocampus_nt.fasta - region_region_AA.fasta - region_region_nt.fasta - striatum_AA.fasta - striatum_nt.fasta - striatum_sex_AA.fasta - striatum_sex_nt.fasta - switchlist_objects/: This directory contains intermediate and final isoformSwitchAnalyzeR objects. Region_all in the filename is a list of four switchlists that compare a single brain region (cerebellum, cortex, hippocampus, striatum) to all others in aggregate. Region_sex in the filename is a list of four switchlists (cerebellum, cortex, hippocampus, striatum) that compare across sexes (male and female). Region_region denotes a single switchlist that includes all pairwise region comparisons. Sex in the name without region is comparing all regions in aggregate. - de_added/: This directory contains final isoformSwitchAnalyzeR objects that include open reading frame and differential expression results incorporated. - region_all_switchlist_list_orf_de.Rds - region_region_orf_de.Rds - region_sex_switchlist_list_orf_de.Rds - orf_added/: This directory contains intermediate and final isoformSwitchAnalyzeR objects with open reading frame information added. - region_all_switchlist_list.Rds - region_region_switchlist_analyzed.Rds - region_sex_switchlist_list.Rds - sex_switchlist_analyzed.Rds - pfam_added/: This directory contains final isoformSwitchAnalyzeR objects (including de and orf information) with added protein domain information. Please note pfam does not comprehensively identify all protein domains for every gene. - region_all_list_orf_de_pfam.Rds - region_region_orf_de_pfam.Rds - region_sex_list_orf_de_pfam.Rds - raw/: This directory contains the initial isoformSwitchAnalyzeR objects, without additional information added. - region_all_switchlist_list.Rds - region_region_switchlist_analyzed.Rds - region_sex_switchlist_list.Rds - sex_switchlist.Rds







This page was built for dataset: Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage