SPASS dataset: A synthetic polyphonic dataset with spatiotemporal labels of sound sources

DOI10.5281/zenodo.8239067Zenodo8239067MaRDI QIDQ6723786FDOQ6723786

Dataset published at Zenodo repository.

P. Huijse, Diego Vergara, Enrique Suárez, Victor Poblete, Jorge P. Arenas, Diego Espejo, Victor Vargas, Rhoddy Viveros-Muñoz, Matthieu Vernier

Publication date: 26 December 2022

Copyright license: Creative Commons Attribution 4.0 International

Description

SPASS is a synthetic dataset that consists of 10-seconds audio segments from 5 acoustic scenes: Park Square Street Waterfront Market Each acoustic scene has 5,000 audio recordings and its corresponding metadata. The audio recordings were created using a 3D acoustic simulation environment (RAVEN, https://www.virtualacoustics.org/RAVEN/). SPASS was made as a training dataset for the FuSA system (https://www.acusticauach.cl/fusa/). This is a polyphonic dataset for Sound Event Detection (SED) tasks. The metadata files includes the class of each sound event, their onset and offset in time, the position in the space (cartesian) and their final position if the class was moving. This research was funded by ANID FONDEF grant number ID20I10333.

This page was built for dataset: SPASS dataset: A synthetic polyphonic dataset with spatiotemporal labels of sound sources