ESCOTT mutational effect predictions for ProteinGym Substitutions Dataset with Colabfold MSAs
DOI10.5281/zenodo.10624677Zenodo10624677MaRDI QIDQ6698393FDOQ6698393
Dataset published at Zenodo repository.
Thomas Henry, Mustafa Tekpinar, Alessandra Carbone
Publication date: 6 February 2024
Copyright license: Creative Commons Attribution 4.0 International
This dataset includes all escott calculations for ProteinGym dataset and the calculations were performed withv1.6.0 of PRESCOTT package. ProteinGym dataset contains 72 proteins and there are 87 experiments performed on these 72 proteins. This dataset is organized into three folders:1-predictions2-analyses3-figures-and-csv-files 'predictions' folder has 72 subdirectories, one for each protein.Let's use BLAT as an example to show the most essential files in each protein folder inside 'predictions'.You can find the following files in BLAT_ECOLX_full_11 directory: *aliBLAT.fasta (input file): This is the MSA file used for the calculations and it was obtained with Colabfold. *ranked_0.pdb (input file): This is the pdb model used to deduce structural parameters by escott algorithm. *BLAT_ECOLX_Stiffler_2015.mut (input file): This file is a list of mutations in simple text format. *BLAT_jet.res (data file): This file contains trace(tjet), cv and pc parameters for each amino acid in a protein. This data is produced and used by escott. *BLAT_ECOLX_Stiffler_2015_normPred_evolCombi.txt (output file): This file contains raw (unprocessed) escott predictions. This file is the most important output of escott algorithm. *BLAT_ECOLX_Stiffler_2015_singleline.csv (experimental data file): We compare our escott predictions to the experimental measurements given in this file. 'analyses' folder contains 3 types of analyses that we used in our study:1-secondary-structure-analysis2-escott-averages-analysis3-amino-acid-type-analysis 'figures-and-csv-files' folder contains png files and their data in csv format.
This page was built for dataset: ESCOTT mutational effect predictions for ProteinGym Substitutions Dataset with Colabfold MSAs