Library of Two Million Unique Small Molecules with Precalculated Fingerprints, Descriptors, and Cardiotoxicity Inhibition Data

From MaRDI portal
(Redirected from Dataset:6683897)



DOI10.5281/zenodo.11066707Zenodo11066707MaRDI QIDQ6683897FDOQ6683897

Dataset published at Zenodo repository.

Wout Bittremieux, Kris Laukens, Issar Arab

Publication date: 25 April 2024

Copyright license: Creative Commons Attribution 4.0 International



This repository comprises a dataset of ~2 million unique compounds saved in an hdf5 small molecule library store, which includes the following fields for each molecule: InChI key Standardized SMILES string Compound source ChEMBL identifier if the compound exists in this open access database 1024-bit Morgan fingerprint 2048-bit Morgan fingerprint 881-bit PubChem fingerprints 854 vector-length of preprocessed and standardized Mordred descriptors and cardiotoxicity inhibition predictions for each of the three cardiac ion channels (hERG, Nav1.5, and Cav1.2) using CtoxPred2 along with the model confidence scores. The repository also includes a Jupyter notebook that serves as an initial guide for querying the small molecule library store. Export both files to the same folder, allocate approximately 40 GB of available memory disk space, unzip the library store, and then launch the notebook to begin querying. Upon usage, please cite this publication: Issar Arab, Kris Laukens, Wout Bittremieux, Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set, Journal of Chemical Information and Modeling, (2024). doi:10.1021/acs.jcim.4c01102







This page was built for dataset: Library of Two Million Unique Small Molecules with Precalculated Fingerprints, Descriptors, and Cardiotoxicity Inhibition Data