WhichDog: A crowdsourced dataset including candidate set-based labelling

DOI10.5281/zenodo.7100698Zenodo7100698MaRDI QIDQ6704497FDOQ6704497

Dataset published at Zenodo repository.

Jerónimo Hernández-González, Iker Beñaran-Muñoz, Aritz Pérez

Publication date: 22 September 2022

Copyright license: Creative Commons Attribution 4.0 International

A dataset with crowdsourced labels for aggregation and supervised classification. It contains 400 images of dogs from the Stanford Dogs dataset (http://vision.stanford.edu/aditya86/ImageNetDogs/). Images of dogs that belong to 32 different breeds (classes) are included. Annotators were asked to provide two types of labelling: full labelling (each labeler is allowed to provide a single label for each image) and candidate labelling (each labeler is allowed to provide a set of candidate labels for each image). It includes a total of 61227 annotations (30628 full and 30599 candidate) obtained from a set of 1028 different labelers. The labels were collected through the online crowdsourcing platform Amazon mTurk thanks to funds provided by the Basque Government through the Elkartek program (KK-2018/00071). The assignments were designed as sequences of 64 images that were given to the annotators. Each image in the sequence was provided together with a specific subset of possible labels (with the number of options ranging from 4 to 32), and a instruction for the annotator to perform a specific type of labelling (full or candidate). Each labeler performed at least one assignment. Not all the labelers completed the 64 annotations in their assignments. The file whichdog.zip contains a folder(images) with the 400 images of dogs, a text file (breed_names.txt) that indicates the names of the different breeds and their assigned label(a number in the interval from 0 to 31) and a CSV file(whichdog_all_annots.csv) that contains the information about the annotations. Each row of the CSV file represents a single annotation, and each column shows: - image_id: ID number of the image. - is_candidate: indicates whether the requested labelling is full (0) or candidate (1). - labeler_id: ID number of the labeler. - time: time employed by the labeler to perform the annotation. - answer: label or set of labels provided by the labeler as annotation. - options: subset of possible labels shown to the labeler. - assignment_id: ID number of the assignment - sequence_point: number that indicates the point of the sequence of images of the assignment in which the annotation was provided. - class: ground truth label of the image.

This page was built for dataset: WhichDog: A crowdsourced dataset including candidate set-based labelling