Datasets for practical model selection for prospective virtual screening
DOI10.5281/zenodo.1411506Zenodo1411506MaRDI QIDQ6694092FDOQ6694092
Dataset published at Zenodo repository.
F. Michael Hoffmann, Scott A. Wildman, Gene E. Ananiev, Andrew F. Voter, Anthony Gitter, James L. Keck, Shengchao Liu, Moayad Alnammi, Spencer S. Ericksen
Publication date: 7 September 2018
Copyright license: Creative Commons Attribution 4.0 International
This repository contains datasets for the manuscript Practical model selection for prospective virtual screening: pria_rmi_cv.tar.gz: A compressed directory containing chemical screening data for thePriA-SSB AS,PriA-SSB FP, and RMI-FANCM FP binary datasets. The files also contain the associated continuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints. The dataset has been split into five folds for cross validation. pria_rmi_pcba_cv.tar.gz: A compressed directory containing chemical screening data for thePriA-SSB AS,PriA-SSB FP, and RMI-FANCM FP binary datasets as well as public PubChem BioAssay datasets. The files also contain thePriA-SSB andRMI-FANCMcontinuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints. The dataset has been split into five folds for cross validation. Missing values are left blank. pria_prospective.csv.gz: A compressed file containing chemical screening data for the binarydatasetPriA-SSB prospective.The filealso contains the continuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints. If you usethesedata in a publication, please cite: Shengchao Liu+, Moayad Alnammi+, Spencer S. Ericksen, Andrew F. Voter, Gene E. Ananiev, James L. Keck, F. Michael Hoffmann, Scott A. Wildman, Anthony Gitter. Practical Model Selection for Prospective Virtual Screening. Journal of Chemical Information and Modeling. 2018 doi:10.1021/acs.jcim.8b00363 PubChem data were provided by thePubChem database. Follow the PubChem citation guidelines if you use the PubChem data. See Voter et al. 2017(PubChem AID1272365) for the PriA-SSB screening data and Voter et al. 2016 (PubChem AID1159607) for RMI-FANCM. Version 1.1.0 updatesall of the data files. We standardized the SMILES in all files by generating canonical SMILES with RDKitversion 2016.03.4. In addition, we removed 2845 chemicals frompria_prospective.csv.gz that were duplicates of compounds inpria_rmi_cv.tar.gz.
This page was built for dataset: Datasets for practical model selection for prospective virtual screening