A multi-species benchmark for training and validating large scale mass spectrometry proteomics machine learning models
DOI10.5281/zenodo.13685813Zenodo13685813MaRDI QIDQ6708688FDOQ6708688
Dataset published at Zenodo repository.
Publication date: 4 September 2024
Copyright license: Creative Commons Attribution 4.0 International
This is a de novo sequencing benchmark dataset derived from ninepublicly available mass spectrometry datasets. There are two versionsof the benchmark: main and balanced. The balanced version randomlyeliminates some spectra associated with some species in order tocreate a smaller, more evenly balanced dataset. Also provided are twozip files containing the raw data as well as intermediate results.Details about how the benchmark was created are provided in anassociated zenodo release, which contains the source code as well as amanuscript describing the benchmark. This release fixes a bug that incorrectly detected shared peptides between different species. It also includes the annotated spectra in mzSpecLib format.
This page was built for dataset: A multi-species benchmark for training and validating large scale mass spectrometry proteomics machine learning models