Data Sets and Results for "Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins"

From MaRDI portal
(Redirected from Dataset:6702189)




Data sets and results for Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins The file dna_binding_protein_sequences.zip has the training and testing sets from the paper: RLL - random_train/test_full_1000.csv RSL - random_train/test_40.csv RSLL - random_train/test_40_1000.csv RLL where included positive examples have verified DNA binding activity -random_train/test_hq_1000.csv The 10 RSLL data sets - random_train/test_40_1000.csv +random_train/test_40_1000_cv_0-8.csv The results files arenamed similarly. See see_results.ipynb in the codebase that supplement thesedata sets The species data sets are derived from uniprot_data_bac.tab and uniprot_data_not_bac.tab. See code. The ESM embeddings used by the XGBoost model are in dna_binding_protein_esm.zip











This page was built for dataset: Data Sets and Results for "Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins"