DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature
DOI10.5281/zenodo.6961986Zenodo6961986MaRDI QIDQ6700199FDOQ6700199
Dataset published at Zenodo repository.
Henning Müller, Dhrangadhariya Anjani
Publication date: 27 April 2022
Datasets DISTANT-CTO is a weakly-labelled dataset of Intervention and Comparator entity annotated sentences. The dataset was obtained using candidate generation the approach described in DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low Resource Entity Extraction Using Clinical Trials Literature. distantcto_high_conf.txt - ds conf 1.0 (full dataset) extraction1_pos_posnegtrail_conf09.txt - ds conf 0.9 (partial dataset) The physio test set is a dataset comprising 153 PICO annotated randomized controlled trial abstracts from Physiotherapy and Rehabilitation. This dataset was used as an additional benchmark to evaluate the generalization power of the weakly annotated dataset and NER model for this sub-domain. Utility The dataset could be used as an input for training Intervention named-entity recognition (NER) models. Availability This directory includes extraction1_pos_posnegtrail_conf09.txt - This text data file contains all the weak annotations (source intervention terms mapped onto target sentences) from clinicaltrials.org (CTO) with a confidence score of 0.9 and above. The directory also includes physio_sent_annot2POS_posnegtrail.txt This data file contains manually annotated (Intervention entity) data from the physiotherapy and rehabilitation domain. It follows a roughly similar structure as described in the Description for long targets section. (Participant and Outcome annotations are removed from this file)
This page was built for dataset: DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature