M100 dataset: time-aggregated data for anomaly detection
DOI10.5281/zenodo.7541722Zenodo7541722MaRDI QIDQ6724898FDOQ6724898
Dataset published at Zenodo repository.
Massimiliano Guarrasi, Daniela Galetti, Mirko Cestari, Luca Benini, Mohsen Seyedkazemi Ardebili, Andrea Borghesi, Francesco Barchi, Andrea Bartolini, Alessio Mauri, Francesco Beneventi, Martin Molan, Carmine di Santi
Publication date: 31 January 2023
Copyright license: Creative Commons Attribution 4.0 International
This entry is a part of a larger data set collected from the most recent Tier-0 supercomputer hosted at CINECA (Marconi100, https://www.hpc.cineca.it/hardware/marconi100). The data covers the entirety of the system, ranging from the computing nodes (980+ computing nodes) internal information such as core loads, temperatures, frequencies, memory write/read operations, CPU power consumption, fan speed, GPU usage details, etc., to the system-wide information, including the liquid cooling infrastructure, the air conditioning system, the power supply units, workload manager statistics, and job-related information, system status alerts, and weather forecast. It comprises hundreds of metrics measured on each computing node, in addition to hundreds of other metrics gathered from sensors monitored along all system components. This particular dataset is made for anomaly detection purposes, it containsthe same data as the main dataset but aggregated over time, with one Parquet file for each node. The data is distributed in tarballs, each one including all the files relative to the nodes contained in a given rack. For each file, the rows represent periods of 15 minutes, with the columns being aggregated values (average, standard deviation, min, max) over all the IPMI metrics that are available for the node; an additional column contains anomaly labels from Nagios. More details can be found in the companion repository: https://gitlab.com/ecs-lab/exadata, including the spatial distribution of the nodes in the room.
This page was built for dataset: M100 dataset: time-aggregated data for anomaly detection