Using novel data and ensemble models to improve automated labeling of Sustainable Development Goals

From MaRDI portal
Publication:57798

DOI10.48550/ARXIV.2301.11353arXiv2301.11353MaRDI QIDQ57798FDOQ57798

Dominik S. Meier, Rui Mata, Dirk U. Wulff

Publication date: 25 January 2023

Abstract: A number of labeling systems based on text have been proposed to help monitor work on the United Nations (UN) Sustainable Development Goals (SDGs). Here, we present a systematic comparison of systems using a variety of text sources and show that systems differ considerably in their specificity (i.e., true-positive rate) and sensitivity (i.e., true-negative rate), have systematic biases (e.g., are more sensitive to specific SDGs relative to others), and are susceptible to the type and amount of text analyzed. We then show that an ensemble model that pools labeling systems alleviates some of these limitations, exceeding the labeling performance of all currently available systems. We conclude that researchers and policymakers should care about the choice of labeling system and that ensemble methods should be favored when drawing conclusions about the absolute and relative prevalence of work on the SDGs based on automated methods.







Cited In (1)






This page was built for publication: Using novel data and ensemble models to improve automated labeling of Sustainable Development Goals

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q57798)