ATP synthase evolution on a cross-braced dated tree of life
DOI10.5281/zenodo.10012837Zenodo10012837MaRDI QIDQ6710718FDOQ6710718
Dataset published at Zenodo repository.
Tom A. Williams, Edmund Rr Moody, Tara A Mahendrarajah, Philip Cj Donoghue, Lénárd L Szántho, Adrián A Davín, Dominik Schrempf, Gergely J Szöllősi, Davide Pisani, Anja Spang, Nina Dombrowski
Publication date: 17 October 2023
Copyright license: Creative Commons Attribution 4.0 International
AbstractThe timing of early cellular evolution, from the divergence of Archaea and Bacteria to the origin of eukaryotes, is poorly constrained. The ATP synthase complex is thought to have originated prior to the Last Universal Common Ancestor (LUCA) and analyses of ATP synthase genes, together with ribosomes, have played a key role in inferring and rooting the tree of life. We reconstruct the evolutionary history of ATP synthases using an expanded taxon sampling set and develop a phylogenetic cross-bracing approach, constraining equivalent speciation nodes to be contemporaneous, based on the phylogenetic imprint of endosymbioses and ancient gene duplications. This approach results in a highly resolved, dated species tree and establishes an absolute timeline for ATP synthase evolution. Our analyses show that the divergence of ATP synthase into F- and A/V-type lineages was a very early event in cellular evolution dating back to more than 4Ga, potentially predating the diversification of Archaea and Bacteria. Our cross-braced, dated tree of life also provides insight into more recent evolutionary transitions including eukaryogenesis, showing that the eukaryotic nuclear and mitochondrial lineages diverged from their closest archaeal (2.67-2.19Ga) and bacterial (2.58-2.12Ga) relatives at approximately the same time, with a slightly longer nuclear stem-lineage.Repository Contents1_100Eukaryote_genomes.tar.gz: includes all protein sequence files for the 100 Eukaryotes sampled in this study.2_Phylogenies.tar.gz: includes all files used for phylogenetic analyses. Folders are organized as follows:1_ATPsynthase_gene_trees: this folder contains all sequence, alignment, and treefiles for the ATP synthase gene trees. Files are organized as follows and are associated with the corresponding parts of the manuscript: Figure 3, Figure 5B, Supplementary Figures 5-10, Supplementary Figures 18-19Folder '1_sequences' includes all unaligned fasta sequence files for each ATP synthase gene tree (see Methods)Folder '2_alignments' includes all alignments generated using MAFFT L-INS-i (subdirectory: 1_untrimmed) and trimmed with BMGE (subdirectory: 2_trimmed)Folder '3_treefiles' includes all IQ-TREE2 output files for all ATP synthase gene phylogenies. Any files with suffix *taxa.treefile contain the full taxonomic string for each accession.Folder '4_pdfs' includes PDF filesfor each ATP synthase gene tree2_Eukaryotic_subsets: this folder contains all sequence, alignment, and tree files for ATP synthase Eukaryotic subset gene trees. Files are organized as follows and are associated with the corresponding parts of the manuscript: Supplementary Figure 11Folder '1_sequences' includes all unaligned fasta sequence filesfor the eukaryotic subsets.Folder '2_alignments' includes all alignments generated using MAFFT L-INS-i (subdirectory: 1_untrimmed) and trimmed with BMGE (subdirectory: 2_trimmed).Folder '3_treefiles' includes all Bayesian trees inferred for eukaryotic subsets.Folder '4_pdfs' includes PDF filesfor each eukaryotic subset tree3_21eLife_concatenated_species_tree: this folder contains all sequence, alignment, and tree files for the single gene tree and concatenated phylogeny analyses (inferred using 21 single-copy marker genes, see Methods). Files are organized as follows and are associated with the following parts of the manuscript: Figure 1, Supplementary Figure 20Folder '1_inspection_start' corresponds to the initial manual inspection of the single gene trees and includes the following subdirectories:Folder '1_sequences' includes all protein sequence fasta files corresponding to the 27 original single-copy marker genesFolder '2_alignments' includes all alignment files generated using MAFFT L-INS-i (subdirectory: 1_untrimmed) and trimmed with BMGE (subdirectory: 2_untrimmed)Folder '3_treefiles' includes all IQ-TREE2 output files for all phylogenies (27 single-copy marker genes)Folder '4_pdfs' includes PDF files for each single gene treeFolder '2_inspection_final' corresponds to the final manualinspection of the single gene trees and includes the following subdirectories:Folder '1_sequences' includes all protein sequence fasta files corresponding to the final 21 single-copy marker genesFolder '2_alignments' includes all alignment files generated using MAFFT L-INS-i (subdirectory: 1_untrimmed) and trimmed with BMGE (subdirectory: 2_untrimmed)Folder '3_treefiles' includes all IQ-TREE2 output files for all phylogenies (21single-copy marker genes)Folder '4_pdfs' includes PDF files for each single gene treeFolder '3_concatenated_phylogeny' contains concatenated alignment generated from the final 21 single-copy marker gene alignmentsFolder '1_alignment' includes the concatenated alignment generated from the 21 trimmed alignments from the final inspectionFolder '2_treefiles' includes all IQ-TREE2 output files for trees inferred using the two different models (subdirectories: LG+C20+R+F and LG+C60+R+F)Folder '4_Eukaryote_only_phylogeny' contains sequence, alignment, and tree files for 21 single-copy marker genes used to infer a Eukaryote-only phylogeny. Folder is organized as follows and files correspond to Supplementary Figure 3:Folder '1_sequences' includes all protein sequence fasta files corresponding to the 21 single-copy marker genes with only EukaryotesFolder '2_alignments' includes all alignment files generated using MAFFT L-INS-i (subdirectory: 1_untrimmed) and trimmed with BMGE (subdirectory: 2_untrimmed)Folder '3_concatenated_phylogeny' includes concatenated alignment generated from 21 single-copy markers with only Eukaryotes (subdirectory: 1_alignment) and all IQ-TREE2 output files for the concatenated phylogeny(subdirectory: 2_treefiles)Folder '4_pdfs' includes PDF files for the concatenated Eukaryote tree4_Ribosomal_species_tree: this folder contains all sequence, alignment, and tree files for the single gene tree and concatenated phylogeny analyses (inferred using 12 ribosomal marker genes, see Methods). Files are organized as follows and are associated with the corresponding parts of the manuscript: Figure 5A, Figure 5C, Supplementary Figures 12-16, Supplementary Figure 21Folder '1_sequences' includes all protein sequence fasta files for the original 15 ribosomal proteins. Sequence sets include the best-hit Archaea and Bacteria, and nuclear, mitochondrial, and plastid eukaryotic homologsFolder '2_alignments' includes all alignment files generated using MAFFT L-INS-i (subdirectory: 1_untrimmed) and trimmed with TRIMAL (gappy-out) (subdirectory: 2_trimmed)Folder '3_treefiles' includes all original FastTree tree files, tree files with highlighted sequences to remove (*blue-to-rem = eukaryotic nuclear homolog only; *colored-to-rem = eukaryotic nuclear, mitochondrial, and plastid homologs). PDFs of each marker gene tree are also included that depict highlighting of sequences to keep and/or remove.Folder '4_concatenated_phylogeny' contains concatenated alignment generated from the final 12 ribosomal marker genesFolder'1_alignment' includes the concatenated alignment generated with 12 ribosomal marker proteins in MAFFT L-INS-i and trimmed with TRIMAL (gappy-out)Folder'2_phylogeny' includes all IQ-TREE2 output files for the species tree inferred using the LG+C60+R+F model5_Dating_analysis:includes all Mcmcdate output files for the dating analyses (species tree and ATP synthase gene tree, see Methods).Folder '0_Starting_species_phylogenies' includes the treefiles (with and without taxonomic string) for the Edited1 and Edited2 topologies that were used in the dating analyses (see Methods).Folder '1_Edited1_dating' includes all dated tree files and monitor files for braced and unbraced analyses of the Edited1 species tree topology. Data corresponds to Supplementary Figure 12, Supplementary Figure 14-15Folder '2_Edited2_dating' includes all dated tree files and monitor files for braced and unbraced analyses of the Edited2 (focal) species tree topology. Data corresponds to Figure 5A, Figure 5C, Supplementary Figure 13, Supplementary Figure 16.Folder '3_ATP_synthase_dating' includes all dated tree files and monitor files for braced and unbraced analyses of the ATP synthase gene tree. Data corresponds to Figure 5B, Supplementary Figures 18-19.3_Scripts.tar.gz: includes all workflows and scripts used for phylogenetic analyses.1_workflows: includes bash workflows for phylogenetic analyses (details on software versions are included in each workflow summary):Workflow_ATPsynthase_gene_trees.sh: generation of the ATP synthase phylogeniesWorkflow_21eLife_marker_phylogeny.sh: inferring the 21 marker-gene species treeWorkflow_Ribosomal_species_tree.sh: inferring the 12 ribosomal marker-gene species treeWorkflow_Database_annotations.sh: workflow for gene annotation for 800 sampled Archaea, Bacteria, and Eukaryota2_R_scripts: includes R scripts used for the Eukaryote sequence contamination screening (Figure 1, Figure 2, Supplementary Figure 2, Supplementary Figures 4, 5, 8-10),presence-absence analyses (Figure 1, Figure 2, Supplementary Figure 2), and plotting tree figures (Supplementary Figures 4-10).Input mapping files and R output files are included.Folder '1_Euk_contamination_screen' contains workflow 'Eukaryote_contamination_screen.Rmd' used to inspect Eukaryotic ATP synthase sequences for bacterial contaminationFolder '2_Presence_absence' includes sub-directories:Folder '1_Species_tree' includes the treefile(s) used for ordering the plots in Figure 1 and Supplementary Figure 2 ('1_tree'),the taxonomic and COG mapping files and the list of putative contamination to remove('2_input_files'), the raw count table for all 800 taxa ('3_Output_files'), R output plot(s) ('4_Plotting'), andthe script to generate presence-absence plots 'Presence-absence.R'.Folder '2_Eukaryotes_only' includes organelle information, protein mapping files, taxonomic mapping files, and list of putative contamination to remove ('1_Input_files'); raw count table of ATP synthase subunits ('2_Output_files'); andR output plots ('3_Output_files').Please see 'Eukaryote_contamination_screen.Rmd' in parent directory '2_R_scripts' for more information on how Eukaryotic sequences were screened, how the list of contaminating sequences was curated, and how the plot for Figure 2 was generated.Folder '3_Plotting_trees' includes the rectangular and radial trees generated for each ATP synthase trees (see Supplementary Figures 5-10). Trees were generated from the treefiles for the ATP synthase gene trees (see above), and script 'Plotting_trees.RmdMarker_gene_counts.R' script used to count marker genes per genome (see Methods)3_TimeTree: includes python scripts used to generate the time-trees (Figure 5C, Supplementary Figures 15 and 19)4_ALE_workflow:example bash workflow used to run ALE. For details see Methods.
This page was built for dataset: ATP synthase evolution on a cross-braced dated tree of life