Data from: High-resolution methylome analysis uncovers stress-responsive genomic hotspots and drought-sensitive TE superfamilies in the clonal Lombardy poplar

From MaRDI portal
Dataset:6704419



DOI10.5281/zenodo.8428770Zenodo8428770MaRDI QIDQ6704419FDOQ6704419

Dataset published at Zenodo repository.

Cristian Peña-Ponton, Lauren McIntyre, Koen J.f. Verhoeven, Emanuele de Paoli, Lars Opgenoorth, Wim van der Putten, Paloma Perez-Bello, Barbara Diez-Rodriguez, Claude Becker, Katrin Heer

Publication date: 10 October 2023



The following dataset contains the processed data presented in the articleHigh-resolution methylome analysis uncovers stress-responsive genomic hotspots and drought-sensitive TE superfamilies in the clonal Lombardy poplar Supplementary_methods.docx: contain detailed information for the experimental stress treatments, sequencing library preparation, sequencing and DMR calling. BedGraph files(CpG.bed, CHG.bed, CHH.bed):contain methylation levels (%) for each cytosine in the Lombardy poplar genome, in the respective sequence context. The first three columns represent the genomic coordinates of the cytosine, the 56 following columns indicatethe methylation levels for each of the samples. Missing values are represented with NA (when particularcytosines were not captured by the sequencing method). DMR_annotation_Populus_nigra_Italica_after_biotic_and_abiotic_treatments.txt: contains all the identified regions that showed significant stress-induced differential methylation (DMR). The file include all annotation for each single DMR: genomic location, genomic feature, gene, TE, sequence contextand stress treatment, besides other specific relevant information. sample_IDs_basic_metadata.txt: contains the sample ID and the associated metadata (stress treatment and ortet location and ID) for all samples used in the analysis. Supplementary_file_1_metadata_samples.xlsx:contains the metadataassociated to each sample including:sequencing statistics before and after quality and adapter trimming, read mapping and coverage statistics, and number of interrogated cytosines on each sequence context (CpG, CHG, CHH). Supplementary_file_2_GO_enrichments.xlsx: contains the complete results for the GO enrichment analysis for different gene datasets associated to: drought-CHH-DMRs, SINEs, MITEs, SINEs + MITEs. italica_denovo_TE_280920.gff: contains the predicted TEs using the following methodology.First, TEs were de-novo annotated using the Extensive de-novo TE Annotator (EDTA) (version 1.8.3) (https://github.com/oushujun/EDTA) with default parameters, except for option --sensitive:1, which uses RepeatModeler (version 2.0.1) to identify remaining TEs. All the steps in EDTA pipeline were selected (filter, final and anno) in order to perform whole-genome annotation/analysis after the TE library was constructed. Then, intheannotated library from EDTA, we merged overlapping fragments and fragmentslocated at a close distance (10bp) in a strand wise manner. Themerged fragment was annotatedas the family of longer merged fragment. Structural variants derived from nanopore data were used to redefine the boundaries of overlapping TE fragments to be more precise with actual predictions.LINE elements were identified independently by RepeatModelerin order to construct a more comprehensive de-novo TE library. SaliS.fasta:contains the consensus sequences of Salicaceae SINE families (SaliS), the file was built by extracting informationfrom the supplementary table 2 of the publication: Divergence of 3 ends as a driver of short interspersed nuclear element (SINE) evolution in the Salicaceae (https://doi.org/10.1111/tpj.14721) Pnigra_Italica_SaliS.bed: the file contains the annotated SaliS found by blastn over the P. nigra Italica reference genome(-qcov_hsp_perc 90 -perc_identity 70 -word_size 7).Column headers: chr, start, end, length, strand, perc_identity, SaliS family. Pnigra_Italica_all_TEs_for_anno.bed:contains the merged information fromitalica_denovo_TE_280920.gff andPnigra_Italica_SaliS.bed.Column headers: chr, start, end, length, strand, perc_identity (only for SaliS), TE superfamily. CXX_ortet_DMRs_merged.bed: contains DMRs merged from all pairwise DMR callings between two ortets. One file per context. Column headers: chr, start, end, number of comparisons where the DMR occur, avg number of cytosines (when called in multiple DMR callings), avg differential methylation vs. control (when called in multiple DMR callings), avg adjusted p value (when called in multiple DMR callings), avg DMR length (when called in multiple DMR callings). SCRIPTS cov_filtering.sh: to filter individual positions according to a custom threshold. unionbedg_with_NAs.sh: to merge information from different samples in a single file taking into account the percentage of missing values per position across the given samples. anovas_and_contrasts_boxplots_barplots_cld.r: to perform statistical tests for the effect of treatments and ortets on the average global methylation. Each sequence context was analyzed separately. CHH_noise_filter.sh: to remove cytosines with invariable methylation values across 90% of the samples. GlobalMethAvg_calculation.r: to calculate global average methylation given a methylation file (CpG.bed, CHG/bed or CHH.bed) and sample file. Hclustering_and_PCAs_analysis.r: to perform hierarchical clustering, principal component analysis and plot the respective figures. ICC_matrices_analysis.r: to calculate intraclass correlation coefficients among all pairwise combinations and plot colored grids Annotations are based on the de novo reference genome of the Populus nigra cv. Italica clone uploaded in the ENA project: PRJEB44889 (www.ebi.ac.uk/ena/browser/view/GCA_950102115). Bisulfite sequencing data can be found under the ENA project: PRJEB51831







This page was built for dataset: Data from: High-resolution methylome analysis uncovers stress-responsive genomic hotspots and drought-sensitive TE superfamilies in the clonal Lombardy poplar