Unprocessed data from the Jungle Weather Zooniverse citizen science project
DOI10.5281/zenodo.7104838Zenodo7104838MaRDI QIDQ6686342FDOQ6686342
Dataset published at Zenodo repository.
Publication date: 22 September 2022
Copyright license: Creative Commons Attribution 4.0 International
The Jungle Weather project aimed to transcribe weather observations recorded between 1949 and 1958 in the tropical rainforest of the Democratic Republic of the Congo. Long-term observations of tropical weather are rare. The Jungle Weather, as part of the COBECORE project, contains observations of three decades of data of weather in the central African tropical forest, and are therefore an extraordinary source of information to support our understanding of for example drought resilience of trees species. Summary Both input and output of the citizen science transcriptions are provided in this data set. This includes the original cut-outs as used in the Zoonivese project, and the output as generated by the Zooniverse data export routines. The data export routines provided CSV output with JSON subfields on the content of each classification made. In addition, we provided the exported subject list and the details of each workflow. In total the project output constitutes of four files: transcribe-climate-data-classifications.csv (annotations of the table cells) transcribe-meta-data-classifications.csv (annotations of table headers) jungle-weather-workflows.csv (description of the citsci workflow) jungle-weather-subjects.csv (list of all images transcribed, and their online location for validation / referencing) and roughly ~3GB in data volume. Context Our understanding of forest ecosystem responses to climate change relies on consistent long-term observations to provide baseline measurements. In the central Congo Basin established long-term observation programs are rare. In terms of meteorological observations, the central Congo Basin is currently represented by only a few rain gauges, limiting climate forecasts across the Congo Basin and the central African continent. This lack of long-term (historical) climatological data leaves the central Congo Basin spatially and temporally under-represented. However, old climate records could provide valuable information about previous growing conditions of the forest. Large amounts of ecological and climatological data, approximately five decades (~1910 1960), exists as unexplored heritage, stored in various Belgian federal archives and collections. As part of a larger project called Congo Basin eco-climatological data recovery and valorization (COBECORE, see) the Jungle Weather project will help transcribe historical climatological data as measured throughout the Congo Basin. These data will in part complement the completed Jungle Rhythms Zooniverse project, further valorizing these transcribed data. Historical data Within this project we will focus on data records as recorded throughout the tropical part of what is currently the Democratic Republic of the Congo (DRC). The area which we will cover is shown above in the map as an open polygon. The project will not cover the southern province of Katanga (red crosshatches) as this area transitions here from tropical to a humid subtropical climate. The historical data is archived and stored in the Belgian State Archives. The Belgian State Archive harbour almost all data regarding colonial affairs, ranging from communications about trade to the raw data as digitized within the context of the Jungle Weathers project. Row upon row of data is stored in the basement. Below you see a part of the INEAC (Institut National pour lEtude Agronomique du Congo belge) archive, which holds all climatological records. These climatological records were noted rigorously on carbon copy paper. However, due to the hand written nature of the data (and the volume involved) automated processing is not possible. Although optical character recognition (OCR) works wonderfully on printed data the high variability in characters and the low contrast pencil markings contribute to the failure of current automated approaches. Similar to the Old Weather project and in spirit of the Jungle Rhythms project, a keen eye is required to decipher the numbers written down on these sheets. Pre-processing / digitization The project provided citizen scientists with digital pictures of the original sheets. Scanning these climate data sheets was a laborious process. In total more than 70 000 records were digitized. Unlike the Old Weather project we did not require citizen scientists to outline valid sections of the sheet. This part of the processing has been automated. We refer to our Jungle Weather pre/post-processing repository for more details and example code As such, once digitized and properly aligned the whole record was divided into an estimated 30 million cells and 70 000 header files. Below you find an example of a header file and a table cell. During the Jungle Weather project we selected a subset of ~300K table cells for transcription in efforts to validate further Machine Learning based, automated, transcriptions approaches. All data were transcribed by citizen scientists in the spring/summer of 2020. Notes The provided data is raw data, and expert knowledge is required for the correct interpretation of this data. Please contact the authors for the proper context if you are interested in using this data in your project.
This page was built for dataset: Unprocessed data from the Jungle Weather Zooniverse citizen science project