Multi-method gene clusters at species-level resolution for 125 prokaryotic pangenomes (Q6696458)
From MaRDI portal
!
WARNING
This is the item page for this Wikibase entity, intended for internal use and editing purposes.
Please use the normal view instead:
Dataset published at Zenodo repository.
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | Multi-method gene clusters at species-level resolution for 125 prokaryotic pangenomes |
Dataset published at Zenodo repository. |
Statements
This dataset contains 9 sets of species-level gene clusters and high-resolution species trees for 125 representative bacterial and archaeal species, encompassing a total of 6,851 nearly complete genomes. Each set represents a different approach to homology-, orthology-, and synteny-based gene clustering as implemented by 6 popular tools for comparative genomics and pangenome analysis (Roary, panX, OrthoFinder, MMseqs2/PanACoTa, CD-HIT, and eggNOG-mapper). For Escherichia coli, Cutibacterium acnes, Bacteroides uniformis, and Staphylococcus epidermidis, we provide additional sets that combine high-quality genomes with different proportions of medium- and low-quality metagenome-assembled genomes (MAGs). This dataset is a helpful resource for benchmarking gene clustering tools and pangenome analysis workflows, as well as for testing their robustness with respect to the presence of incomplete or contaminated genomic assemblies. Reference: Manzano-Morales S, Liu Y, Gonzlez-Bod S, Huerta-Cepas J, Iranzo J. 2022. Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses. bioRxiv doi: 10.1101/2022.09.25.509376
0 references
4 October 2023
0 references
3.0
0 references