A multi-species benchmark for training and validating large scale mass spectrometry proteomics machine learning models

From MaRDI portal
(Redirected from Dataset:6708688)




This is a de novo sequencing benchmark dataset derived from ninepublicly available mass spectrometry datasets. There are two versionsof the benchmark: main and balanced. The balanced version randomlyeliminates some spectra associated with some species in order tocreate a smaller, more evenly balanced dataset. Also provided are twozip files containing the raw data as well as intermediate results.Details about how the benchmark was created are provided in anassociated zenodo release, which contains the source code as well as amanuscript describing the benchmark. This release fixes a bug that incorrectly detected shared peptides between different species. It also includes the annotated spectra in mzSpecLib format.











This page was built for dataset: A multi-species benchmark for training and validating large scale mass spectrometry proteomics machine learning models