Grass Phylogeny Working Group III: data repository

From MaRDI portal
Dataset:6687751



DOI10.5281/zenodo.13778480Zenodo13778480MaRDI QIDQ6687751FDOQ6687751

Dataset published at Zenodo repository.

Joanne L. Birch, Michael R. McKain, Richard W. Jobson, Aelys M. Humphreys, John J.e. Thompson, Hong Ma, Gerrit Davidse, Ximena Londoño, Felix Forest, Todd G.b. McLay, Elizabeth A. Kellogg, Jeffrey Bennetzen, Luke T. Dunning, Neville G. Walsh, Wenli Chen, Jing-Xia Liu, Marc Sosef, Amanda E. Fisher, Olinirina P. Nanjarisoa, Darren M. Crayn, Daniel J. Murphy, William J. Baker, Isabel Larridon, Alexandre R. Zuntini, Fernando O. Zuloaga, Jeffery M. Saarela, Charlotte Couch, Pilar Catalán, Weichen Huang, Neil W. Snow, Cassiano A.d. Welker, Michelle Waycott, Chien-Hsun Huang, G. Anthony Verboom, Maria S. Vorontsova, Sarah Z. Ficinski, Guy E. Onjalalaina, Paul M. Peterson, Jan Hackel, Thomas Haevermans, Olivier Maurin, Watchara Arthan, Rokiman Letsara, Maria Fernanda Moreno-Aguilar, Teera Watcharamongkol, Trevor R. Hodkinson, Maarten Christenhusz, Martin D. Xanthos, Siri Fjellheim, Nianhe Xia, Paweena Traiperm, Rivontsoa A. Rakotonasolo, Jacob D. Washburn, J. Travis Columbus, Lynn G. Clark, Canisius J. Kayombo, Russell L. Barrett, Pascal-Antoine Christin, Alexander Zizka, Robert J. Soreng, Matheus Bianconi, Quentin W.r. Luke, Lynn J. Gillespie, Lin Zhang, Melvin R. Duvall, Matthew D. Barrett, Lalita Simpson, Jacqueline Razanatsoa, Terry D. MacFarlane, Guillaume Besnard, John M. Kimeu, Soejatmi Dransfield, De-Zhu Li

Publication date: 18 September 2024

Copyright license: Creative Commons Attribution 4.0 International



Grass Phylogeny Working Group III: data repository Phylogenetic analyses of the grass family (Poaceae) using nuclear and plastid data. The data set includes 1153 accessions corresponding to 1133 accepted species. Genomic data was obtained from different sources including target capture, shotgun, transcriptomes and annotated genomes. Nuclear markers (Angiosperm353 gene set) were assembled from short read data using HybPiper or a custom assembly pipeline optimized for low coverage shotgun data. Plastid genes were either retrieved from published plastome sequences or assembled here using getOrganelle. This data set also includes the results of a gene tree-species tree reconciliation analysis using GeneRax. Contact persons: Matheus E. Bianconi (matheus-enrique.bianconi@univ-tlse3.fr), Jan Hackel (jan.hackel@uni-marburg.de), Maria S. Vorontsova (m.vorontsova@kew.org) Content description 1. Metadata gpwgIII_samples_metadata_taxonomy.tsv Tab-separated file with details for all 1,702 accessions used in this study. Columns: analysis_ID - ID in nuclear analyses; analysis_ID_plastome - ID in plastome analyses; acc_species - accepted species name; acc_species_author - taxonomic species authority; acc_genus - accepted genus name; acc_genus_author - taxomomic genus authority; publication - associated prior publication; data type - type of sequence data; isolate - laboratory isolate ID; voucher_ID - herbarium voucher ID; germplasm_ID - germplasm collection ID; repo_accession - accession number in public repository; plastome_accession - accession number of assembled plastome sequence; removed_nuclear - reason for removal from nuclear tree, if applicable; removed_plastome - reason for removal from plastome tree, if applicable; soreng2022_genus - genus name in Soreng et al. 2022, https://doi.org/10.1111/jse.12847; subtribe, tribe, subfamily, major.clade - classification according to Soreng et al. 2022. 2. Nuclear data - Dataset1 ("main")Number of samples: 1153Number of genes: 331Alignment trimming threshold: gt = 0.1 (removed sites 90% missing data)Genes per sample: 166 - Dataset2 ("strict trimming")Number of samples: 1153Number of genes: 315Alignment trimming threshold: gt = 0.5 (removed sites 50% missing data)Genes per sample: 158 - Dataset3 (dataset 1 without shotgun samples)Number of samples: 841Number of genes: 331Alignment trimming threshold: gt = 0.1 (removed sites 90% missing data)Genes per sample: 166 2.1. Raw sequences Raw Ang353 sequence assemblies for all samples (pre-trimming and filtering) raw_Ang353_sequences.zip 2.2 Nuclear gene alignments Trimmed alignments from datasets 1, 2 and 3. alignments_dataset1_main_final.zip alignments_dataset2_strict_trimming_final.zip alignments_dataset3_no_shotgun_final.zip 2.3. Nuclear gene treesGene trees inferred using RAxML (GTRCAT, 100 bootstraps) for the alignments from datasets 1, 2 and 3. gene_trees_dataset1_main_final.zip gene_trees_dataset2_strict_trimming_final.zip gene_trees_dataset3_no_shotgun_final.zip 2.4. Multigene species treesMultigene species trees obtained using Astral-Pro3 from gene trees for datasets 1, 2 and 3. astralpro_trees.zip, which includes: trees_Ang353_grasses_dataset1_main_gtrcat.astralpro trees_Ang353_grasses_dataset2_strict_trimming_gtrcat.astralpro trees_Ang353_grasses_dataset3_no_shotgun_gtrcat.astralpro 3. Gene treespecies tree reconciliation generax.zip Compressed zip archive with input files and results, including log files, of the GeneRax reconciliation analysis. One subfolder for each of the four analyses run: "all_tribes", "Andropogoneae", "Bambusoideae", "Triticeae". transfers_reconciliation_analyses.zip, which includes: transfers_all_all_tribes.tsv: Tab-separated file with all transfers inferred with the tribe-level Poaceae reconciliation analysis. Each line represents one transfer inferred. transfers_all_Andropogoneae.tsv: Tab-separated file with all transfers inferred with the Andropogoneae reconciliation analysis. Each line represents one transfer inferred. transfers_all_Bambusoideae.tsv: Tab-separated file with all transfers inferred with the Bambusoideae reconciliation analysis. Each line represents one transfer inferred. transfers_all_Triticeae.tsv: Tab-separated file with all transfers inferred with the Triticeae reconciliation analysis. Each line represents one transfer inferred. transfers_counts_all_tribes.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the tribe-level Poaceae reconciliation analysis. transfers_counts_Andropogoneae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Andropogoneae reconciliation analysis. transfers_counts_Bambusoideae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Bambusoideae reconciliation analysis. transfers_counts_Triticeae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Triticeae reconciliation analysis. 4. Plastome data Alignment and phylogenetic tree from plastome data. plastome_files.zip, which includes reduced_plastome_concat_CDS_trnLtrnF_trimmed.fna-out.fas: FASTA file with the final, concatenated DNA alignment of 71 plastome regions for 910 accessions, after data filtering. partitions.txt: Text file with positions of the 71 plastome regions in the concatenated alignment. plastome_concat_CDS_trnLtrnF_trimmed_TBE.raxml.support: Plastome tree with Transfer Bootstrap Expectation values as node labels. RAxML_bipartitions.plastome_concat_CDS_trnLtrnF_trimmed: Maximum likelihood plastome tree inferred with RAxML, with Felsenstein bootstrap values as node labels. RAxML_bootstrap.plastome_concat_CDS_trnLtrnF_trimmed: 100 rapid bootstrap pseudoreplicate plastome trees inferred with RAxML. RAxML_info.plastome_concat_CDS_trnLtrnF_trimmed: RAxML analysis log file. nuc_plastome_matching_tips.tab: Tab-separated file with accessions matched in nuclear-plastome comparison. 5. Poaceae-specific reference Ang353 datasetReference sequence dataset used for the assembly of Ang353 sequences in this study. target_Ang353_sequences_grasses.zip 6. Shotgun assembly script Custom script used for the assembly of Ang353 sequences from shotgun data shotgun_assembler_script.zip, which includes: shotgun_assembler_Ang353_sequences.sh: script for assembly of short reads from shotgun data template_manifest_file.tsv: TAB-separated file to specify sample names and location of short read files (required by the assembly script) list_Ang353_genes_orthofinder.txt: list of Ang353 gene identifiers (required by the assembly script) 7. Quartet metrics script R script to calculate the Quartet Concordance (QC) and Quartet Differential (QD) metrics from the gene tree frequencies/proportions for each quartet at a branch, following Pease et al. 2018 (American Journal of Botany, https://doi.org/10.1002/ajb2.1016). quartet_metrics.R







This page was built for dataset: Grass Phylogeny Working Group III: data repository