Multi-method gene clusters at species-level resolution for 125 prokaryotic pangenomes (Q6696458)

From MaRDI portal
!
WARNING

This is the item page for this Wikibase entity, intended for internal use and editing purposes.

Dataset published at Zenodo repository.
Language Label Description Also known as
default for all languages
No label defined
    English
    Multi-method gene clusters at species-level resolution for 125 prokaryotic pangenomes
    Dataset published at Zenodo repository.

      Statements

      0 references
      This dataset contains 9 sets of species-level gene clusters and high-resolution species trees for 125 representative bacterial and archaeal species, encompassing a total of 6,851 nearly complete genomes. Each set represents a different approach to homology-, orthology-, and synteny-based gene clustering as implemented by 6 popular tools for comparative genomics and pangenome analysis (Roary, panX, OrthoFinder, MMseqs2/PanACoTa, CD-HIT, and eggNOG-mapper). For Escherichia coli, Cutibacterium acnes, Bacteroides uniformis, and Staphylococcus epidermidis, we provide additional sets that combine high-quality genomes with different proportions of medium- and low-quality metagenome-assembled genomes (MAGs). This dataset is a helpful resource for benchmarking gene clustering tools and pangenome analysis workflows, as well as for testing their robustness with respect to the presence of incomplete or contaminated genomic assemblies. Reference: Manzano-Morales S, Liu Y, Gonzlez-Bod S, Huerta-Cepas J, Iranzo J. 2022. Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses. bioRxiv doi: 10.1101/2022.09.25.509376
      0 references
      4 October 2023
      0 references
      0 references
      0 references
      0 references
      0 references
      0 references
      3.0
      0 references

      Identifiers

      0 references