A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts

From MaRDI portal
Dataset:6703825



DOI10.5281/zenodo.11612556Zenodo11612556MaRDI QIDQ6703825FDOQ6703825

Dataset published at Zenodo repository.

Teso Stefano, Andrea Passerini, Marconato Emanuele, Bortolotti Samuele, Morettin Paolo, Van Krieken Emile, Vergari Antonio, Carraro Tommaso

Publication date: 12 June 2024

Copyright license: Creative Commons Attribution-ShareAlike 4.0 International



Codebase [Github] | Dataset [Zenodo] Abstract The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data.To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available on Github. Usage We recommend visiting the official codewebsite for instructions on how to use the dataset and accompaying software code. License All ready-made data sets and generated datasets are distributed under theCC-BY-SA 4.0license, with the exception ofKand-Logic, which is derived fromKandinsky-patternsand as such is distributed under theGPL-3.0 license. Datasets Overview CLIP-embeddings. This folder contains the saved activations from a pretrained CLIP model applied to the tested dataset. It includes embeddings that represent the dataset in a format suitable for further analysis and experimentation. BDD_OIA-original-dataset. This directory holds the original files from the X-OIA project by Xu et al. [1]. These datasets have been made publicly available for ease of access and further research. If you are going to use it, please consider citing the original authors. kand-logic-3k. This folder contains all images generated for the Kand-Logic project. Each image is accompanied by annotations for both concepts and labels. bbox-kand-logic-3k. In this directory, you will find images from the Kand-Logic project that have undergone a preprocessing step. These images are extracted based on bounding boxes, rescaled, and include annotations for concepts and labels. sdd-oia. This folder includes all images and labels generated using rsbench. sdd-oia-embeddings. This directory contains 512-dimensional embeddings extracted from a pretrained ResNet18 model on ImageNet. The embeddings are derived from the sdd-oia`dataset. BDD-OIA-preprocessed. Here you will find preprocessed data that follow the methodology outlined by Sawada and Nakamura [2]. The folder contains 2048-dimensional embeddings extracted from a pretrained Faster-RCNN model on the BDD-100k dataset. The original BDD datasets can be downloaded from the following Google Drive link: [Download BDD Dataset]. References [1] Xu et al., *Explainable Object-Induced Action Decision for Autonomous Vehicles*, CVPR 2020. [2] Sawada and Nakamura, *Concept Bottleneck Model With Additional Unsupervised Concepts*, IEEE 2022.







This page was built for dataset: A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts