Automated benchmarking of combined protein structure and ligand conformation prediction
DOI10.5281/zenodo.8348280Zenodo8348280MaRDI QIDQ6691848FDOQ6691848
Dataset published at Zenodo repository.
Michèle Leemann, Xavier Robin, Jerome Eberhardt, Torsten Schwede, Ander Sagasta, Janani Durairaj
Publication date: 15 September 2023
Copyright license: Creative Commons Attribution 4.0 International
The prediction of protein-ligand complexes (PLC), using both experimental and predicted structures, is an active and important area of research, underscored by the inclusion of the Protein-Ligand Interaction category in the latest round of the Critical Assessment of Protein Structure Prediction experiment CASP15. The prediction task in CASP15 consisted of predicting both the three-dimensional structure of the receptor protein as well as the position and conformation of the ligand. This paper addresses the challenges and proposed solutions for devising automated benchmarking techniques for PLC prediction. The reliability of experimentally solved PLC as ground truth reference structures is assessed using various validation criteria. Similarity of PLC to previously released complexes are employed to judge PLC diversity and the difficulty of a PLC as a prediction target. We show that the commonly used PDBBind time-split test-set is inappropriate for comprehensive PLC evaluation, with state-of-the-art tools showing conflicting results on a more representative and high quality dataset constructed for benchmarking purposes. We also show that redocking on crystal structures is a much simpler task than docking into predicted protein models, demonstrated by the two PLC-prediction-specific scoring metrics created. Finally, we introduce a fully automated pipeline that predicts PLC and evaluates the accuracy of the protein structure, ligand pose, and protein-ligand interactions. This repository contains: all_validation_clustering_data.tsv - X-ray validation data and MMSeqs cluster identifiers at different sequence identities for over a million small molecule and ion-binding pockets in the PDB. hqr_dataset.tsv - PDB IDs and ligand information for the high quality representative (HQR) dataset described in the manuscript score_files.tar.gz - Full docking results for all detected pockets for the PDBBind time-split test-set, the HQR dataset, and the subsets of AF models created for both datasets. One file per tool benchmarked with the following columns: Tool, Complex, Pocket, Rank, lDDT-PLI, lDDT-LP, BiSyRMSD, Reference_Ligand, Tool-generated Score errors_all_sets.csv - Report of failures running the pipeline with the following columns: Process, Complex/Ligand/Receptor, Problem
This page was built for dataset: Automated benchmarking of combined protein structure and ligand conformation prediction