Grass Phylogeny Working Group III: data repository (Q6687751)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Grass Phylogeny Working Group III: data repository |
Dataset published at Zenodo repository.
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | Grass Phylogeny Working Group III: data repository |
Dataset published at Zenodo repository. |
Statements
Grass Phylogeny Working Group III: data repository Phylogenetic analyses of the grass family (Poaceae) using nuclear and plastid data. The data set includes 1153 accessions corresponding to 1133 accepted species. Genomic data was obtained from different sources including target capture, shotgun, transcriptomes and annotated genomes. Nuclear markers (Angiosperm353 gene set) were assembled from short read data using HybPiper or a custom assembly pipeline optimized for low coverage shotgun data. Plastid genes were either retrieved from published plastome sequences or assembled here using getOrganelle. This data set also includes the results of a gene tree-species tree reconciliation analysis using GeneRax. Contact persons: Matheus E. Bianconi (matheus-enrique.bianconi@univ-tlse3.fr), Jan Hackel (jan.hackel@uni-marburg.de), Maria S. Vorontsova (m.vorontsova@kew.org) Content description 1. Metadata gpwgIII_samples_metadata_taxonomy.tsv Tab-separated file with details for all 1,702 accessions used in this study. Columns: analysis_ID - ID in nuclear analyses; analysis_ID_plastome - ID in plastome analyses; acc_species - accepted species name; acc_species_author - taxonomic species authority; acc_genus - accepted genus name; acc_genus_author - taxomomic genus authority; publication - associated prior publication; data type - type of sequence data; isolate - laboratory isolate ID; voucher_ID - herbarium voucher ID; germplasm_ID - germplasm collection ID; repo_accession - accession number in public repository; plastome_accession - accession number of assembled plastome sequence; removed_nuclear - reason for removal from nuclear tree, if applicable; removed_plastome - reason for removal from plastome tree, if applicable; soreng2022_genus - genus name in Soreng et al. 2022, https://doi.org/10.1111/jse.12847; subtribe, tribe, subfamily, major.clade - classification according to Soreng et al. 2022. 2. Nuclear data - Dataset1 ("main")Number of samples: 1153Number of genes: 331Alignment trimming threshold: gt = 0.1 (removed sites 90% missing data)Genes per sample: 166 - Dataset2 ("strict trimming")Number of samples: 1153Number of genes: 315Alignment trimming threshold: gt = 0.5 (removed sites 50% missing data)Genes per sample: 158 - Dataset3 (dataset 1 without shotgun samples)Number of samples: 841Number of genes: 331Alignment trimming threshold: gt = 0.1 (removed sites 90% missing data)Genes per sample: 166 2.1. Raw sequences Raw Ang353 sequence assemblies for all samples (pre-trimming and filtering) raw_Ang353_sequences.zip 2.2 Nuclear gene alignments Trimmed alignments from datasets 1, 2 and 3. alignments_dataset1_main_final.zip alignments_dataset2_strict_trimming_final.zip alignments_dataset3_no_shotgun_final.zip 2.3. Nuclear gene treesGene trees inferred using RAxML (GTRCAT, 100 bootstraps) for the alignments from datasets 1, 2 and 3. gene_trees_dataset1_main_final.zip gene_trees_dataset2_strict_trimming_final.zip gene_trees_dataset3_no_shotgun_final.zip 2.4. Multigene species treesMultigene species trees obtained using Astral-Pro3 from gene trees for datasets 1, 2 and 3. astralpro_trees.zip, which includes: trees_Ang353_grasses_dataset1_main_gtrcat.astralpro trees_Ang353_grasses_dataset2_strict_trimming_gtrcat.astralpro trees_Ang353_grasses_dataset3_no_shotgun_gtrcat.astralpro 3. Gene treespecies tree reconciliation generax.zip Compressed zip archive with input files and results, including log files, of the GeneRax reconciliation analysis. One subfolder for each of the four analyses run: "all_tribes", "Andropogoneae", "Bambusoideae", "Triticeae". transfers_reconciliation_analyses.zip, which includes: transfers_all_all_tribes.tsv: Tab-separated file with all transfers inferred with the tribe-level Poaceae reconciliation analysis. Each line represents one transfer inferred. transfers_all_Andropogoneae.tsv: Tab-separated file with all transfers inferred with the Andropogoneae reconciliation analysis. Each line represents one transfer inferred. transfers_all_Bambusoideae.tsv: Tab-separated file with all transfers inferred with the Bambusoideae reconciliation analysis. Each line represents one transfer inferred. transfers_all_Triticeae.tsv: Tab-separated file with all transfers inferred with the Triticeae reconciliation analysis. Each line represents one transfer inferred. transfers_counts_all_tribes.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the tribe-level Poaceae reconciliation analysis. transfers_counts_Andropogoneae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Andropogoneae reconciliation analysis. transfers_counts_Bambusoideae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Bambusoideae reconciliation analysis. transfers_counts_Triticeae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Triticeae reconciliation analysis. 4. Plastome data Alignment and phylogenetic tree from plastome data. plastome_files.zip, which includes reduced_plastome_concat_CDS_trnLtrnF_trimmed.fna-out.fas: FASTA file with the final, concatenated DNA alignment of 71 plastome regions for 910 accessions, after data filtering. partitions.txt: Text file with positions of the 71 plastome regions in the concatenated alignment. plastome_concat_CDS_trnLtrnF_trimmed_TBE.raxml.support: Plastome tree with Transfer Bootstrap Expectation values as node labels. RAxML_bipartitions.plastome_concat_CDS_trnLtrnF_trimmed: Maximum likelihood plastome tree inferred with RAxML, with Felsenstein bootstrap values as node labels. RAxML_bootstrap.plastome_concat_CDS_trnLtrnF_trimmed: 100 rapid bootstrap pseudoreplicate plastome trees inferred with RAxML. RAxML_info.plastome_concat_CDS_trnLtrnF_trimmed: RAxML analysis log file. nuc_plastome_matching_tips.tab: Tab-separated file with accessions matched in nuclear-plastome comparison. 5. Poaceae-specific reference Ang353 datasetReference sequence dataset used for the assembly of Ang353 sequences in this study. target_Ang353_sequences_grasses.zip 6. Shotgun assembly script Custom script used for the assembly of Ang353 sequences from shotgun data shotgun_assembler_script.zip, which includes: shotgun_assembler_Ang353_sequences.sh: script for assembly of short reads from shotgun data template_manifest_file.tsv: TAB-separated file to specify sample names and location of short read files (required by the assembly script) list_Ang353_genes_orthofinder.txt: list of Ang353 gene identifiers (required by the assembly script) 7. Quartet metrics script R script to calculate the Quartet Concordance (QC) and Quartet Differential (QD) metrics from the gene tree frequencies/proportions for each quartet at a branch, following Pease et al. 2018 (American Journal of Botany, https://doi.org/10.1002/ajb2.1016). quartet_metrics.R
0 references
18 September 2024
0 references