DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature

From MaRDI portal
Dataset:6700199



DOI10.5281/zenodo.6961986Zenodo6961986MaRDI QIDQ6700199FDOQ6700199

Dataset published at Zenodo repository.

Henning Müller, Dhrangadhariya Anjani

Publication date: 27 April 2022



Datasets DISTANT-CTO is a weakly-labelled dataset of Intervention and Comparator entity annotated sentences. The dataset was obtained using candidate generation the approach described in DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low Resource Entity Extraction Using Clinical Trials Literature. distantcto_high_conf.txt - ds conf 1.0 (full dataset) extraction1_pos_posnegtrail_conf09.txt - ds conf 0.9 (partial dataset) The physio test set is a dataset comprising 153 PICO annotated randomized controlled trial abstracts from Physiotherapy and Rehabilitation. This dataset was used as an additional benchmark to evaluate the generalization power of the weakly annotated dataset and NER model for this sub-domain. Utility The dataset could be used as an input for training Intervention named-entity recognition (NER) models. Availability This directory includes extraction1_pos_posnegtrail_conf09.txt - This text data file contains all the weak annotations (source intervention terms mapped onto target sentences) from clinicaltrials.org (CTO) with a confidence score of 0.9 and above. The directory also includes physio_sent_annot2POS_posnegtrail.txt This data file contains manually annotated (Intervention entity) data from the physiotherapy and rehabilitation domain. It follows a roughly similar structure as described in the Description for long targets section. (Participant and Outcome annotations are removed from this file)







This page was built for dataset: DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature