Inferring and comparing metabolisms across heterogeneous sets of annotated genomes using AuCoMe

DOI10.5281/zenodo.7752449Zenodo7752449MaRDI QIDQ6715939FDOQ6715939

Dataset published at Zenodo repository.

Jonas Collen, Simon M. Dittami, Anne Siegel, Méziane Aite, Arnaud Belcour, Samuel Blanquart, Clémence Frioux, Catherine Leblanc, Jeanne Got, Ludovic Delage, Gabriel V. Markov

Publication date: 20 March 2023

Copyright license: Creative Commons Attribution 4.0 International

Description

CONTENT OF THIS ARCHIVE The Zenodo archive is composed of one file and four main directories: * analyses gathers three subdirectories: algae, bacteria, and fungi. It includes all files used to create the figures, supplemental figures, and results of the paper. * code contains all AuCoMe and PADMET codes. aucome_v0.5.1 this directory gathers the code of AuCoMe used to run the three datasets. padmet_v5.0.1 this directory contains the code of PADMET used to run AuCoMe. * datasets this directory gathers all datasets on which AuCoMe was run: the bacterial, fungal, and algal datasets, and the 32 synthetic datasets, which contain an E. coli K12 MG1655 genome to which various degradations were applied, together with 28 other bacterial genomes. It also encompasses the version 23.5 of MetaCyc database. * scripts_analyses this directory contains several scripts to generate the figures, supplemental figures and a script to degrade the E. coli K12 MG1655 genome. 1/ Content of the analyses repertory It is composed of three subdirectories: algae, bacteria, and fungi. 1.1/ Content of the algae subdirectory It encompasses 9 files. * Figure_2_algal_nb_reactions.tsv for each species of the algal dataset, this file gives the number of reactions at each AuCoMe step. It was used to create figure 2D. * Figure_S10_Deepec_algal.tsv for each species of the algal dataset, at each AuCoMe step (robust orthology, non-robust orthology, and annotation or orthology), several measures were computed, i.e.: the number of reactions, the number of ECs, the number of ECs validated by DeepEC, and the ratio number of ECs valided by DeepEC / number of ECs. It was used to design figure S10(b). * Table_S6_50_random_reactions_found.xlsx contains manual validation of 50 randomly chosen reactions found in any of the species (is the Supplemental Table S6). * Table_S7_50_random reactions absent.xlsx includes manual validation of 50 reactions absent from a species and randomly chosen (is the Supplemental Table S7). * Table_S8_reactions_common_only_Cokamuranus_Sjaponica.xlsx encompasses reactions common to Saccharina japonica and Cladosiphon okamuranus but not found in other brown algae (is the Supplemental Table S8). * Table_S9_homologues_Esiliculosus_Sjaponica.xlsx contains additional homologs in E. siliculosus found by BLASTP searches for sequences inferred to be present only in C. okamuranus and S. japonica (is the Supplemental Table S9). * Table_S10_o-aminophenol_Esiliculosus_holomogues.xlsx includes additional o-aminophenol oxidases from E. siliculosus and their homologs in other stramenopiles. It is the Supplemental Table S10 with more detail (like the amino acid sequences). * Table_S11_reactions_cryptophytes_haptophytes_stramenopiles_archeplastida.xlsx encompasses reactions distinguishing the cryptophyte, haptophyte, stramenopile, and archeplastida groups (is the Supplemental Table S11). * Table_S12_pathways_cryptophytes_haptophytes_stramenopiles_archeplastida.xlsx contains shared metabolic pathways as well as the absence of pathways between chryptophytes, haptophytes, stramenopiles, and archaeplastida (is the Supplemental Table S12). 1.2/ Content of the bacteria subdirectory It gathers 12 files and 9 repertories. * aucome_final.tsv output file of the figure S4 comparison bacteria.py script, for each of the 29 bacterial metabolic networks produced with AuCoMe, this table contains the number of ECs, the number of unique ECs, the number of total reactions, the number of enzymatic reactions with genes, the number of enzymatic reactions without genes, and the number of spontaneous reactions. * carveme_stat.tsv output file of the figure S4 comparison bacteria.py script, for each of the 29 bacterial metabolic networks produced with CarveMe, this table contains the number of ECs, the number of unique ECs, the number of total reactions, the number of enzymatic reactions with genes, the number of enzymatic reactions without genes, and the number of spontaneous reactions. * ecocyc.padmet contains the EcoCyc database version 23.5 at the PADMet, is used to generate the Supplemental Fig. S5. * Figure_2_bacterial_nb_reactions.tsv for each species of the bacterial dataset, this file gives the number of reactions at each AuCoMe step. It was used to create figure 2B. * Figure_3_nb_reactions_step.tsv for each dataset of the 32 synthetic bacterial datasets, this file enumerates the number of reactions at each AuCoMe step. It was used to create figure 3A. * Figure_3_fmeasure_steps.tsv for each dataset of the 32 synthetic bacterial datasets, this file indicates the values of the F-measures resulting of the comparison of the GSMNs recovered for each E. coli K12 MG1655 genome replicate with the gold-standard network EcoCyc. It was used to create figure 3B. * Figure_S4_output contains 3 output files of the figure_S4_comparison_bacteria.py script: Figure_S4_boxplot_networks.svg is the Supplemental Figure S4 in high resolution. Figure_S4_boxplot_networks.tsv contains the number of reactions, the type of reactions (All, Reactions with genes, ...), and the used software. Thes data were produced and used in the figure_S4_comparison_bacteria.py script. Figure_S4_barplot_time_networks.svg for each software, shows the required time in seconds used to reconstruct these bacterial metabolic networks. * Figure_S5_output encompasses 3 output files of figure S5 reference catalog.py script: Figure_S5_ec_union.svg is the Supplemental Figure S5 in high resolution. Figure_S5_ec_union_venn.svg another visualisation of presenting the results of the Supplemental Fig. S5. Figure_S5_refence_ec_catalog_K12MG1655.tsv contains an EC catalog to E. coli K-12 MG1655 from the BIGG, EcoCyc, KEGG, and ModelSEED databases. This file is used to produce the Supplemental Figure S5. * Figure_S6_output includes 2 output files of the figure_S6.py script: Figure_S6_comparison_all.svg is the Supplemental Figure S6 in high resolution. Figure_S6_comparison_all.tsv contains data used to produce the Supplemental Figure S6. * gapseq_stat.tsv output file of the figure_S4_comparison_bacteria.py script, for each of the 29 bacterial metabolic networks produced with gapseq, this table contains the number of ECs, the number of unique ECs, the number of total reactions, the number of enzymatic reactions with genes, the number of enzymatic reactions without genes, and the number of spontaneous reactions. * jsons_bigg todate contains the five metabolic networks of E. coli K12 MG1655 that can find in BIGG at JSON format. These files correspond to the BIGG reference metabolic network on the Supplemental Figure S5. * jsons_modelseed todate includes the metabolic network of E. coli K12 MG1655 that can find in ModelSEED at JSON format. It is the ModelSEED reference metabolic network on the Supplemental Figure S5. * kegg_ecs.txt input file of the figure_S5_reference_catalog.py script, it contains matches between EC numbers and all the entries of E. coli K12 MG1655 in the KEGG database. * mapping_modelseed_ec.tsv input file of the figure_S4_comparison_bacteria.py script, it encompasses matches between ModelSEED reactions and EC numbers. * modelseed_stat.tsv output file of the figure_S4_comparison_bacteria.py script, for each of the 29 bacterial metabolic networks produced with ModelSEED, this table contains the number of ECs, the number of unique ECs, the number of total reactions, the number of enzymatic reactions with genes, the number of enzymatic reactions without genes, and the number of spontaneous reactions. * networks_aucome for each of the 29 bacteria, contains a metabolic networks at the PADMet format obtained with AuCoMe. * networks_carveme for each of the 29 bacteria, contains a metabolic networks at the SBML format got to CarveMe. * networks_gapseq composes of 29 subdirectories (one for each bacterium). All these subdirectories contain 10 files about the metabolic networks a obtained with gapseq: species-all-Pathways.tbl encompasses data on pathways at TBL format. species-all-Reactions.tbl includes data on reactions at TBL format. species-draft.RDS is a draft metabolic network at RDS (R Data Format). species-draft.xml is a draft metabolic network at SBML format. species-medium.csv encompasses all the metabolites allow the default medium. species.RDS is the final metabolic network at RDS (R Data Format). species-rxnWeights.RDS is a temporary file nedeed to gapseq fill at RDS (R Data Format). species-rxnXgenes.RDS is a temporary file nedeed to gapseq fill at RDS (R Data Format). species-Transporter.tbl includes data on transporters at TBL format. species.xml is the final metabolic network at SBML format. * networks_modelseed includes two subdirectories: sbml for each of the 29 bacteria, encompasses a metabolic networks at the SBML format got to ModelSEED. tsv for each of the 29 bacteria, contains two TSV files: genomeset__species.gbk_genome.fbamodel-compounds.tsv includes data on compounds at TSV format. genomeset__species.gbk_genome.fbamodel-reactions.tsv encompasses data on reactions at TSV format. * time_carveme.txt input file of the figure S4 comparison bacteria.py script, for each of the 29 bacteria it stores the running time of CarveMe (in seconds) to reconstruct a metabolic network. * time_gapseq.txt input file of the figure S4 comparison bacteria.py script, for each of the 29 bacteria it stores the running time of gapseq (in seconds) to reconstruct a metabolic network. 1.3/ Content of the fungi repertory It contains three files and five directories. * All-pathways-of-S.-cerevisiae-S288c.txt encompasses all the YeastCyc pathways. * Figure_2_fungal_nb_reactions.tsv for each species of the fungal dataset, this file gives the number of reactions at each AuCoMe step. It was used to create figure 2C. * Figure_S7_output contains 11 output files of the figure S7 comparison pathway fungi.py script: completion_pathway_species.svg for each of the 5 fungi (L. bicolor, N. crassa, R. oryzae, S. cerevisiae S288C, and S. pombe), contains a subfigure of the Supplemental Fig. S7. fungi_stats.tsv is the Supplemental Table S5. pathway_venn_species.png for each of the 5 fungi (L. bicolor, N. crassa, R. oryzae, S. cerevisiae S288C, and S. pombe), includes a Venn diagram about all the pathways found with the 3 software (AuCoMe, gapseq, and ModelSEED). * Figures_S8_S9_output contains 11 files, in all these files, a comparison of all pathways of metabolic networks of S. cerevisiae S288C obtained with AuCoMe and gapseq to those of YeastCyc was released. comparison_yeastcyc.png is a picture about number of pathways true positive, false positive, and false negative are found, according the used method (AuCoMe and gapseq). completion_pathway_gapseq.svg includes the number of pathways common or specific to YeastCyc and gapseq with their completeness ratio predicted by gapseq. Figure_S8_completion_pathway_aucome.svg contanis the number of pathways common or specific to YeastCyc and AuCoMe with their completeness ratio predicted by AuCoMe, is the Supplemental Figure S8. Figure_S9_venn_diagram_70_100.svg is the Supplemental Figure S9. All pathways of AuCoMe, gapseq and YeastCyc with a completion rate between 50% and 70% are compared. venn_diagram.svg in this picture, all pathways are compared. venn_diagram 50.svg all pathways of AuCoMe, gapseq and YeastCyc with a completion rate less than 50% are compared. venn_diagram_50_gapseq.svg all pathways of gapseq whatever their completion rate are compared to the AuCoMe and YeastCyc pathways with a completion rate less than 50%. venn_diagram_50_70.svg all pathways of AuCoMe, gapseq, and YeastCyc with a completion rate between 50% and 70% are compared. venn_diagram_50_70_gapseq.svg all pathways of gapseq whatever their completion rate are compared to the AuCoMe and YeastCyc pathways with a completion rate between 50% and 70%. venn diagram_70_100_gapseq.svg all pathways of gapseq whatever their completion rate are compared to the AuCoMe and YeastCyc pathways with a completion rate between 70% and 100%. yeast_cyc_comparison.tsv contains the number of pathways true positive, false positive, and false negative are found, according the used method (AuCoMe and gapseq). * Figure_S10_Deepec_fungal.tsv for each species of the fungal dataset, at each AuCoMe step (robust orthology, non-robust orthology, and annotation or orthology), several measures were computed, i.e.: the number of reactions, the number of ECs, the number of ECs valided by DeepEC, and ratio number of ECs validated by DeepEC / number of ECs. It was used to design figure S10(a). * networks_aucome for each of the 5 fungi (L. bicolor, N. crassa, R. oryzae, S. cerevisiae S288C, and S. pombe), contains a metabolic networks at the PADMet format obtained with AuCoMe. * networks_gapseq is composed of 5 subdirectories (one for each fungus). All these subdirectories contain two files about the metabolic networks a obtained with gapseq: species-all-Pathways.tbl encompasses data on pathways at TBL format. species-all-Reactions.tbl includes data on reactions at TBL format. * networks_modelseed for each of the 5 fungi (L. bicolor, N. crassa, R. oryzae, S. cerevisiae S288C, and S. pombe), contains two TSV files: species.gbk_genome.draftModel-compounds.tsv includes data on compounds at TSV format. species.gbk_genome.draftModel-reactions.tsv encompasses data on reactions at TSV format. 2/ Content of the code repertory It gathers two directories aucome v0.5.1 and padmet_v5.0.1. 2.1/ Content of the aucome v0.5.1 subdirectory This directory contains a copy of the AuCoMe project on the GitHub site: https://github.com/AuReMe/aucome (downloaded the 15/11/2022). It is composed of two subdirectories and five files: * LICENCE licence of the AuCoMe software. * README.rst README of the AuCoMe software. * requirements.txt contains the list of requires Python packages. * setup.cfg contains metadata about AuCoMe package and is used with setup.py to distribute AuCoMe. * setup.py contains various information relevant to the AuCoMe package including options and metadata. Then, it is used to distribute AuCoMe with PyPI. It is also used to create an entrypoint when installing it with pip. * recipes this subdirectory contains two files: Dockerfile contains instructions to run AuCoMe in a Docker environment. Singularity contains instructions to run AuCoMe in a Singularity container. * aucome this directory contains 11 Python files: __init__.py indicates the directory as a python module. __main__.py contains the functions implementing the command-line interface of AuCoMe. analysis.py contains the functions to analyse the AuCoMe results. check.py contains the functions to check the input files. compare.py contains the functions to compare the AuCoMe results between two distinct subgroups. orthology.py contains the functions to propagate reaction through orthology. reconstruction.py contains the functions to perform the reconstruction of draft GSMNs by using Pathway Tools in a parallel implementation. spontaneous.py contains the functions to add spontaneous reactions to some GSMNs if it completes MetaCyc metabolic pathway. structural.py contains the functions to check that no reactions are missing due to missing gene structures. A genomic search is performed for all reactions present in one organism but not in another. utils.py contains a function to analyse the configuration file. workflow.py contains functions to run all the steps of AuCoMe. 2.2/ Content of the padmet_v5.0.1 subdirectory This directory contains a copy of the PADMET project on the GitHub site: https://github.com/AuReMe/padmet/ (downloaded the 15/11/2022). It is composed of two subdirectories and six files: * CHANGELOG.md records of all notable changes made in the PADMET project. * docs this repertory contains all the documentation files of PADMET package in the RST format. * LICENCE licence of the PADMET package. * README.md manual of the PADMET package. * requirements.txt contains the list of requires Python packages. * setup.cfg contains metadata about PADMET package and is used with setup.py to distribute PADMET. * setup.py contains various information relevant to the PADMET package including options and metadata. Then, it is used to distribute PADMET with PyPI. It is also used to create an entrypoint when installing it with pip. * padmet this repertory grathers two files and two subdirectories: __init__.py indicates the version of PADMET. __main__.py contains the functions implementing the command-line interface of PADMET. classes contains 7 files. utils contains 4 files and 3 subdirectories. 2.2.1/ Content of the class subdirectory The class repertory contains 7 files. * __init__.py indicates the directory as a python module. * instantiation.py contains a function to instantiate padmet object. * node.py contains a class defining a Node object which is representing an element in a metabolic network (e.g: compound, reaction). * padmetRef.py contains a class defining a PadmetRef object which is representing a database of metabolic network. * padmetSpec.py creates a PadmetSpec object which is representing the metabolic network of a species/organism based on a reference database PadmetRef. * policy.py contains a class defining a Policy object that is defining the types of Relations and Nodes of a network. * relation.py contains a class defining a Relation object which is representing a link between two elements (Node) in a metabolic network. 2.2.2/ Content of the utils subdirectory The utils directory contains 4 files and 3 subdirectories. * __init__.py indicates the directory as a python module. * gbr.py implements a lexical analysis to handle genes relationship associated with a reaction, either a complex (with and relation between genes) or isozyme (with or relation between genes). * sbmlPlugin.py contains functions to handle SBML element (ex: species or reaction), then it returns all the sections named notes in a dictionary. * utils.py contains a function that checks paths of file. * connection this subdirectory contains 22 files: - __init__.py indicates the directory as a python module. biggAPI_to_padmet.py allows to extract the BIGG database from the API to create a padmet. An Internet access is required. check_orthology_input.py is written to check if the metabolic network and the proteome of the model organism use the same identifiers for genes (or at least more than a given cutoff), before running orthology based reconstruction. enhanced_meneco_output.py extracts the results from Meneco gap-filling to add more information to the gap-filled reactions. Then it returns a PADMET file with more information for each reaction. extract_orthofinder.py after running Orthofinder on n FASTA files, it reads the output file Orthogroups.tsv to identify the orthologous genes. It is used by AuCoMe to extract the orthologous genes. extract_rxn_with_gene_assoc.py from a given SBML file, it creates a SBML with only the reactions associated to a gene. gbk_to_faa.py extracts protein sequence from a GenBank into a FASTA file with Biopython package. gene_to_targets.py from a list of genes, it gets the products associated with the reactions linked to the genes. For example: R1 is linked to G1, R1 produces M1 and M2, this script outputs: M1, M2. get_metacyc_ontology.py from the PadmetRef of MetaCyc, it creates the MetaCyc ontology. metexploreviz_export.py converts a PADMET object representing a metabolic network into a JSON compatible with MetExplore. modelSeed_to_padmet.py from ModelSEED reactions and pathways files, it creates a PADMET. network_to_gnn.py creates input for GNN (Graph Neural Networks) from PADMET or SBML. padmet_to_asp.py converts PADMET to Answer Set Programming. padmet_to_matrix.py creates a stoichiometry matrix from a PADMET file, in which the columns represent the reactions and rows represent metabolites. padmet_to_padmet.py allows to merge 1-n PADMET. padmet_to_tsv.py converts a PADMET representing a database (PadmetRef) and/or a PADMET representing a model (PadmetSpec) to TSV files. pgdb_to_padmet.py reads a PGDB folder (from BIOCYC/Pathway Tools) and creates a PADMET. It is used by AuCoMe to create PADMET files from PGDB in the annotation-based step. sbmlGenerator.py contains functions to generate SBML files from PADMET and TXT files usign the libsbml package. It is used by AuCoMe to create SBML files at the annotation-based, orthology and final steps. sbml_to_curation_form.py extracts one or several reactions from a SBML file to the form used in AuReMe for curation. sbml_to_padmet.py converts a SBML file into a PADMET file (with or without a reference database). sbml_to_sbml.py creates a SBML file from another one. Use it to change the SBML level. wikiGenerator.py contains all necessary functions to generate wiki pages from a PADMET file and update a wiki online. It requires WikiManager module (with wikiMate, Vendor). * exploration this subdirectory contains 15 files: - __init__.py indicates the directory as a python module. compare_padmet.py compares 1-n PADMET files, and creates a folder with 4 output files (compounds.tsv, genes.tsv, pathways.tsv and reactions.tsv). It is used by AuCoMe to create these files to analyse the metabolic networks. compare_sbml.py compares 2 or 1-n SBML, then it creates two output files reactions.tsv and metabolites.tsv with the reactions/metabolites in each SBML files. compare_sbml_padmet.py compares reaction identifiers in SBML versus PADMET, then returns the number of reactions in both, and reaction identifiers not in SBML or not in PADMET. convert_sbml_db.py uses the MetaNetX database to check or convert a SBML. Flat files from MetaNetx are required to run this script. They can be found in the AuReMe workflow or from the MetaNetx website. dendrogram_reactions_distance.py uses the reactions.tsv file from compare_padmet.py to create a dendrogram using the R package pvclust. It has been used in the article to create the metabolic dendrogram. flux_analysis.py runs the flux balance analyse with cobra package on an already defined reaction. It needs to set in the SBML the value objective_coefficient to 1. get_pwy_from_rxn.py from a file containing a list of reaction, it returns the pathways where these reactions are involved. padmet_stats.py creates a PADMET stats file (named padlet_stats.tsv) containing the number of pathways, reactions, genes and compounds inside the one or several PADMET files. pathway_production.py compares 1-n PADMET objects to show the pathway input/output for them. prot2genome.py contains function to search a genome using protein sequences and Gene-Protein-Reaction associations. It is used in the structural search step of AuCoMe. report_network.py creates reports of a PADMET file, and it writes three TSV files (all metabolites.tsv, all_pathways.tsv, and all_reactions.tsv). visu_network.py allows to visualize a metabolic network on a compounds perspectives. visu_path.py allows to visualize a pathway in PADMET network. visu_similarity_gsmn.py visualize similarity between metabolic networks using MDS. * management this subdirectory contains 5 files: __init__.py indicates the directory as a python module. manual_curation.py updates a PadmetSpec object by filling specific forms. It either creates new reaction(s) to PADMET file, or it adds/removes reaction(s) from a PadmetRef. padmet_compart.py for a given PADMET file, it checks and updates compartment. padmet_medium.py for a given set of compounds representing the growth medium (or seeds), it creates two reactions in order to maintain consistency of the network for flux analysis. relation_curation.py for a given PADMET file, it adds or removes relations between nodes. 3/ Content of the datasets subdirectory It contains a files and four repertories. * metacyc 23.5.padmet the version 23.5 of the MetaCyc database in the PADMET format. It was used by AuCoMe to reconstruct all the metabolic networks. Hence metacyc 23.5.padmet is required to reproduce the article results. 3.1/ Content of the algal, bacterial, and fungal directories These three directories are composed of 8 subdirectories and a Supplemental Table (respectively Table S1, Table S2 and Table S3 in bacterial, fungal and algal directories): * FASTA contains the proteome of each species as a FASTA file. * cleaned_GBKs for each species, it contains the annotated genome, with the protein sequences in a GenBank format file. * dictionaries for some species, genes needed to be renamed for compatibility reasons. This folder contains CSV files with the mapping between the old names of genes and the new ones. * annotated_DATs contains a subdirectory per species with all the output files from Pathway Tools v23.5, without any post-treatment, in the DAT format. * annotated_PADMETs for each species, it contains a metabolic network of the draft reconstruction step of AuCoMe, in the PADMET format. * final_PADMETs for each species, it contains a metabolic network generated by the AuCoMe workflow, at the PADMET format. * final_SBMLs for each species, it contains a metabolic network generated by the AuCoMe workflow, in the SBML format. * panmetabolism is composed of 7 files describing the final metabolic networks: genes.tsv contains, for each organism, the list of genes and the associated reactions. metabolites.tsv contains the list of metabolites present in the panmetabolism. Then, for each metabolite and for each organism, it lists the reactions that produced this compound and the reactions that consumed it. pathways.tsv contains the list of pathways present in the panmetabolism. For each pathway and for each organism, it indicates the number of reactions present in this pathway, and the names of these reactions. reactions.tsv contains the list of reactions present in the panmetabolism. Then for each reaction, it indicates whether or not it belongs to an organism. If a reaction is found in a species, the genes associated with the reaction are also listed. pvclust_reaction_dendrogram.png based on the presence/absence matrix of reactions in different species of the dataset, it computes the Jaccard distances between these species, and it applies a hierarchical clustering on these data with a complete linkage to create a dendrogram. The R package pvclust is used to create the dendrogram, with bootstrap resampling. For each node, a p-value indicates how strong the cluster is supported by data. This dendrogram is provided as a PNG picture. 3.2/ Content of the synthetic_bacterial repertory The synthetic_bacterial repertory contains the Supplemental Table S4 and 32 subdirectories named Run_00, Run_01, . . . , etc, Run 31. Each subdirectory is composed of 9 files: * K_12_MG1655.gbk the annotated genome of E. coli K12 MG1655 to which degradation of the functional and/or structural annotations was applied. * annotated_K_12_MG1655.sbml the metabolic network of E. coli K12 MG1655 output of the draft reconstruction step of AuCoMe in the SBML format. * annotated_K_12_MG1655.padmet the metabolic network of E. coli K12 MG1655 output of the draft reconstruction step of AuCoMe in the PADMET format. * orthology_K_12_MG1655.sbml the metabolic network of E. coli K12 MG1655 output of the orthology propagation step of AuCoMe in the SBML format. * orthology_K_12_MG1655.padmet the metabolic network of E. coli K12 MG1655 output of the orthology propagation step of AuCoMe in the PADMET format. * structural_K_12_MG1655.sbml the metabolic network of E. coli K12 MG1655 output of the structural verification step of AuCoMe in the SBML format. * structural_K_12_MG1655.padmet the metabolic network of E. coli K12 MG1655 output of the structural verification step of AuCoMe in the PADMET format. * final_K_12_MG1655.sbml the metabolic network of E. coli K12 MG1655 output of the AuCoMe workflow in the SBML format. * final_K_12_MG1655.padmet the metabolic network of E. coli K12 MG1655 output of the AuCoMe worflow in the PADMET format. 4/ Content of the scripts_analyses subdirectory The scripts repertory contains 12 files: * bacteria_random_degradation.py was used to degrade the E. coli K12 MG1655 genome. The procedure for the genome degradation is described in the algorithm 1. * figure_2_algal_dataset.py for each species of the algal dataset, and at each AuCoMe step. This script allows to generate the figure 2D. * figure_2_bacterial_dataset.py for each species of the bacterial dataset, and at each AuCoMe step. This script allows to generate the figure 2B. * figure_2_fungal_dataset.py for each species of the fungal dataset, and at each AuCoMe step. This script allows to generate the figure 2C. * figure_3_degradation.py allows to generate the figure 3B from the figure_3_fmeasure_steps.tsv file (described above). * figure_5_mds.py allows to generate the figure 5A from two reactions.tsv files of the algal dataset (annotation-based and final). * figure_S4_comparison_bacteria.py computes statistics on all the 29 bacterial metabolic networks reconstructed with AuCoMe, CarveMe, gapseq and ModelSEED, it uses the mapping_modelseed_ec.tsv, soft_stat.tsv files and bacteria/networks soft directories, then it creates the Supplemental Fig. S4 and files inside the analyses/bacteria/Figure_S4_output repertory. * figure_S5_reference_catalog.py reads the ecocyc.padmet, kegg ecs.txt files, and the jsons bigg/, jsons modelseed/ directories, then it creates the Figure_S5_refence_ec_catalog_K12MG1655.tsv and the Supplemental Fig. S5. * figure_S6.py reads the Figure_S5_refence_ec_catalog_K12MG1655.tsv file and those inside the analyses/bacteria/networks_soft directories about E. coli K12 MG1655, it generates the Supplemental Fig. S6. * figure_S7_comparison_pathway_fungi.py reads metacyc_23.5.padmet file and those inside the analyses/fungi/networks_soft directories, then it create five completion pathway species.svg pictures which composed the Supplemental Fig. S7. * figures_S8_S9.py reads the metacyc_23.5.padmet, and metabolic networks of S. cerevisiae S288 reconstructed with AuCoMe, gapseq, and YeastCyc. For all pathways of both AuCoMe and gapseq networks, it also computes their completion rates. Then it compares the results obtained with AuCoMe and gapseq on S. cerevisiae S288C to YeastCyc according to the completion rates of their pathways. It generates Supplemental Fig. S8 and S9. * figure_S12_supervenn.py allows to generate the figure S12, it reads the reactions.tsv file of the algal dataset at the final AuCoMe step, and another tabular file that contains abbreviated names of species.

This page was built for dataset: Inferring and comparing metabolisms across heterogeneous sets of annotated genomes using AuCoMe