A large comprehensive curated dataset of small molecules and their activities covering three cardiac ion channels: hERG, Cav1.2, and Nav1.5

From MaRDI portal
(Redirected from Dataset:6683872)



DOI10.5281/zenodo.8359714Zenodo8359714MaRDI QIDQ6683872FDOQ6683872

Dataset published at Zenodo repository.

Issar Arab, Kris Laukens, Khaled Barakat, Ke Chen, Wout Bittremieux, Kristof Egghe

Publication date: 9 August 2023

Copyright license: Creative Commons Attribution 4.0 International



The compressed data folder (dataset.rar) represents a data framework for researchers in the field of drug discovery to perform in depth analyses on a very large open-access unique and comprehensive hERG, Nav1.5, and Cav1.2 cardiotoxicity integrated database of small molecules and their activities. The database is organized as follows: Each sub-folder represents a cardiac ion channel target: hERG, Nav1.5, and Cav1.2 Each target sub-folder consists of 3 files in CSV format: One file containing the development set (split into training and validation sets using an 80/20ratiofor hyperparameter tuning). The other 2 files contain external evaluation sets. The first test dataset consistsof compounds with a structural similarity of no more than 60% (Tanimoto similarity 0.6) to the remaining development set, while the second test dataset comprisescompounds with a structural similarity of no more than 70% (Tanimoto similarity 0.7) to the remaining development set. Each file contains data with 7columns: "InChl Key" as a unique identifier of the chemical structure, "SMILES" as the string format of storage and exchange of the chemical structure, "Source" as the upstream data source from which the data wasretrieved, "ChEMBL ID" as theChEMBL identifier if the compound comes fromChEMBL database,"PubChem CID" as thePubChem compoundidentifier if the compound comes fromPubChem database,"pIC50" as thenegative logarithm of the half-maximal inhibitory concentration (IC50) to describe the potency of the compound, and "USED_AS" column specifying whether the compound was used for training or validation. Upon usage, please cite this publication: Issar Arab, Kristof Egghe, Kris Laukens, Ke Chen, Khaled Barakat, Wout Bittremieux, Benchmarking of Small Molecule Feature Representations for hERG, Nav1.5, and Cav1.2 Cardiotoxicity Prediction, Journal of Chemical Information and Modeling, (2023). doi:10.1021/acs.jcim.3c01301







This page was built for dataset: A large comprehensive curated dataset of small molecules and their activities covering three cardiac ion channels: hERG, Cav1.2, and Nav1.5