EnzymeMap
DOI10.5281/zenodo.8254726Zenodo8254726MaRDI QIDQ6693420FDOQ6693420
Dataset published at Zenodo repository.
Daniel Probst, Georg K.h. Madsen, Esther Heid, William H. Green
Publication date: 18 April 2023
Copyright license: Creative Commons Attribution 4.0 International
EnzymeMap (enzymemap_v2_brenda2023.csv)is a large dataset of atom mapped, balancedenzymatic reactions sorted by EC (Enzyme Commission) number.It is intended to be used for machine learning models for predicting enzymatic reactions or bioretrosynthesis. For details on the extraction, correction and curation of the data, please refer to the publication EnzymeMap: Curation, validation and data-driven prediction of enzymatic reactions by E. Heid, D. Probst, W. H. Green and G. K. H. Madsen. Please cite this publication if you use EnzymeMap. A preprint is available athttps://doi.org/10.26434/chemrxiv-2023-jzw9w. The file raw_unmapped_v2_brenda2023.csv furthermore holds raw unmapped, uncurated data used in the publication for retraining of the transformers models behindIBM RXN-for-Chemistry platform. Note: The publication uses the newest version of this data (v2_brenda2023), whereas the online serverIBM RXN-for-Chemistry was trained on version 1 (brenda2022) prior to release of the publication. The origin of EnzymeMap is curated data taken from BRENDA version 2023-1, which was then atom mapped and modestly extended.For some reactions or enzyme classes BRENDA includes additional (uncurated) information not included in EnzymeMap.If one is searching for more information on a particular reaction or enzyme class, we suggest the reader check the corresponding BRENDA entry and the original literature sources. VERSION 2: Correction of erroneous mappings for isomerase reactions, correction of missing protons in some reaction. Addition of protein information were available. Since different proteins can catalyze the same reactions, the number of reactions in EnzymeMap has increased greatly compared to Version 1. Please remove duplicates where necessary (e.g. if your project does not require protein information, drop the respective columns and then remove duplicates).
This page was built for dataset: EnzymeMap