The CrowdGleason dataset: learning the Gleason grade from crowds and experts

From MaRDI portal
(Redirected from Dataset:6701473)



DOI10.5281/zenodo.14178894Zenodo14178894MaRDI QIDQ6701473FDOQ6701473

Dataset published at Zenodo repository.

Alba Morquecho, Aurelio Martín-Castro, Javier Mateos, Rafael Molina, Arne Schmidt, Miguel López-Pérez, Fernando Pérez-Bueno

Publication date: 18 November 2024

Copyright license: Creative Commons Attribution 4.0 International



Introduction This repository contains the associated files to replicate the study entitled "The CrowdGleason dataset: Learning the Gleason grade from crowds and experts", published in the Computer Methods and Programs in Biomedicine, Volume 257, December 2024, 108472. For further details on the study and the dataset, please see the published article. CrowdGleason is a public prostate histopathological dataset which consists of 19,077 patches from 1,045 WSIs with various Gleason grades. The dataset was annotated using a crowdsourcing protocol involving seven pathologists-in-training to distribute the labeling effort. The whole dataset is divided into three sets for training, validation and testing. In detail, a training set with 13,824 patches of size 512 512 and a validation set to 2,327 patches, both extracted from 783 WSIs, annotated by one or more of a crowd of seven pathologists in-training, and a curated test set with 2,926 patches of size 512 512 extracted from other 262 WSIs, annotated by expert pathologists and all the pathologists in-training. Ground-truth labels for the curated test set were obtained by consensus between the expert pathologists and the majority of the pathologists in-training. Dataset files The dataset consists of several files: Patches.zip: contains the 19,077 patches of CrowdGleason in three folders: train, which contains 13,824 patches used for training; val, which contains 2327 patches to validate the models, and test, which contains 2,926 patches that form the test set. NormalizedPatches.zip: contains the same patches of Patches.zip after the colour normalization preprocessing step. These files are useful to directly replicate the results of the paper. Annotations.zip: contains three .csv files with crowdsourcing annotations. These files provide the train/val/test split as well as label information. The 'markerX' columns are the labels given by the X-th annotator; Patch filename is the patch name in the train/val/test folder; ground truth is the ground truth label in the curated test dataset; MV, DS, GLAD, MACE are the labels obtained by label aggregation of the pathologists in-training annotations using the majority voting, Dawid-Skene, GLAD and MACE methods, respectively. We have also included the files corresponding to the external dataset SICAPv2 to enhance the reproducibility of the experiments of our paper: NormalizedSICAPv2.zip: contains the normalized patches of the external dataset SICAPv2. NormalizedSICAPv2_Annotations.zip: contains the labels of the external dataset SICAPv2. Citation @article{LOPEZPEREZ2024108472,title = {The CrowdGleason dataset: Learning the Gleason grade from crowds and experts},journal = {Computer Methods and Programs in Biomedicine},volume = {257},pages = {108472},year = {2024},issn = {0169-2607},doi = {https://doi.org/10.1016/j.cmpb.2024.108472},url = {https://www.sciencedirect.com/science/article/pii/S0169260724004656},author = {Miguel Lpez-Prez and Alba Morquecho and Arne Schmidt and Fernando Prez-Bueno and Aurelio Martn-Castro and Javier Mateos and Rafael Molina},keywords = {Computational pathology, Crowdsourcing, Prostate cancer, Gleason grade, Gaussian processes, Medical image analysis},} Funding: This work was supported in part by FEDER/Junta de Andaluca under project P20_00286, grant PID2022-140189OB-C22 funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU. The work by Miguel Lpez-Prez and Fernando Prez-Bueno was supported by the grantsJDC2022-048318-I and JDC2022-048784-I, respectively, funded by MICIU/AEI/10.13039/501100011033 the European Union NextGenerationEU/PRTR







This page was built for dataset: The CrowdGleason dataset: learning the Gleason grade from crowds and experts