Full-length and split homologs of human proteins in the gut microbiome

From MaRDI portal
(Redirected from Dataset:6708095)



DOI10.5281/zenodo.14037045Zenodo14037045MaRDI QIDQ6708095FDOQ6708095

Dataset published at Zenodo repository.

Patrick Bradley, Matthew Rendina, Peter J Turnbaugh

Publication date: 5 November 2024

Copyright license: Creative Commons Attribution 4.0 International



These files were generated as part of the manuscript "Human xenobiotic metabolism proteins have full-length and split homologs in the gut microbiome" (submitted). The .ipc files are tables of full-length (full_humcover3.ipc) and split homologs (part_humcover3.ipc) of human proteins in the gut microbiome. Note that our pipeline collapses full-length alignments to the same UHGP-90 protein family into a single entry per species, with the number of genomes reported in the column nGenomes. Split homologs are not collapsed because genomic context is used to define them, and this context may differ across individual genomes. These files are in Arrow IPC format, which provides compression and fast I/O for large tables. We recommend reading them using pola.rs or the R Arrow package. In particular, because the full-length homolog table is large, you may wish to work with it without loading it into memory, which can be accomplished usingscan_ipc in pola.rs or open_dataset in R Arrow. We also provide gzipped .csv format datasets of full-length (pgkb_FH_drugs.csv.gz) and split (pgkb_SH_drugs.csv.gz) homologs organized by their PharmGKB annotations. For each drug annotated in PharmGKB as being metabolized by a human protein with full-length or split homologs, we provide the human protein(s) responsible, its xenobiotic enzyme class, the bacterial protein homolog(s), length and percent identity of the alignment, and either the specific genome (g, split homologs only) or the number of genomes (nGenomes, full homologs only). Xenobiotic enzyme classes are defined as in Figure 3 of the manuscript, with the additional classes "nucl" (nucleobase-containing metabolic proteins not annotated to any other class), "redox" (oxidoreductases not annotated to any other class), and "other" (all remaining proteins).







This page was built for dataset: Full-length and split homologs of human proteins in the gut microbiome