CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting
DOI10.5281/zenodo.14883421Zenodo14883421MaRDI QIDQ6684804FDOQ6684804
Dataset published at Zenodo repository.
Rahel Laudien, Malte von Bloh, Lily-Belle Sweet, Andres Castellano, Dainius Masiliūnas, Aike Potze, Inti Luna, Dilli Paudel, Raed Hamed, Amit Kumar Srivastava, Abdelrahman Saleh, Ioannis N. Athanasiadis, Hilmy Baja, Allard de Wit, Siyabusa Mkuhlani, Vasileios Sitokonstantinou, Weston Anderson, Ron van Bree, Alex C. Ruane, Michiel Kallenberg, Petar Vojnović, Guanyuan Shuai, Maximilian Zachow, Stella Ofori-Ampofo, Donghoon Lee, Ritvik Sahajpal, Janet Mumo Mutuku, Jonathan Richetti, Pratishtha Poudel, Michele Meroni, Rogerio de Souza Noia Junior, Oumnia Ennaji, Robert Strong
Publication date: 18 February 2025
CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting Overview CY-Bench is a dataset and benchmark for subnational crop yield forecasting, with coverage of major crop growing countries of the world for maize and wheat. By subnational, we mean the administrative level where yield statistics are published. When statistics are available for multiple levels, we pick the highest resolution. The dataset combines sub-national yield statistics with relevant predictors, such as growing-season weather indicators, remote sensing indicators, evapotranspiration, soil moisture indicators, and static soil properties. CY-Bench has been designed and curated by agricultural experts, climate scientists, and machine learning researchers from theAgML Community, with the aim of facilitating model intercomparison across the diverse agricultural systems around the globe in conditions as close as possible to real-world operationalization. Ultimately, by lowering the barrier to entry for ML researchers in this crucial application area, CY-Bench will facilitate the development of improved crop forecasting tools that can be used to support decision-makers in food security planning worldwide. * Crops : Wheat Maize* Spatial Coverage : Wheat (29 countries), Maize (38). See CY-Bench Summaryfor the list of countries.* Temporal Coverage : Varies. See CY-Bench Summary. Data Data format The benchmark data is organized as a collection of CSV files (with the exception of location information, see below), with each file representing a specific category of variable for a particular country. Each CSV file is named according to the category and the country it pertains to, facilitating easy identification and retrieval. The data within each CSV file is structured in tabular format, where rows represent observations and columns represent different predictors related to a category of variable. Data content All data files are provided as .csv. Data Description Variables (units) Temporal Resolution Data Source (Reference) crop_calendar Start and end of growing season sos (day of the year), eos (day of the year) Static World Cereal (Franch et al, 2022) fpar fraction of absorbed photosynthetically active radiation fpar (%) Dekadal (3 times a month; 1-10, 11-20, 21-31) European Commission's Joint Research Centre (EC-JRC, 2024) ndvi normalized difference vegetation index - approximately weekly MOD09CMG (Vermote, 2015) meteo temperature, precipitation (prec), radiation, potential evapotranspiration (et0), climatic water balance (= prec - et0) tmin (C), tmax (C), tavg (C), prec (mm0, et0 (mm), cwb (mm), rad (J m-2 day-1) daily AgERA5 (Boogaard et al, 2022), FAO-AQUASTAT for et0 (FAO-AQUASTAT, 2024) soil_moisture surface soil moisture, rootzone soil moisture ssm (kg m-2), rsm (kg m-2) daily GLDAS (Rodell et al, 2004) soil available water capacity, bulk density, drainage class awc (c m-1), bulk_density (kg dm-3), drainage class (category) static WISE Soil database (Batjes, 2016) yield end-of-season yield yield (t ha-1) yearly Various country or region specific sources (see crop_statistics_... in https://github.com/BigDataWUR/AgML-CY-Bench/tree/main/data_preparation) Folder structure cybench-data:The CY-Bench dataset has been structure at first level by crop type and subsequently by country. For each country, the folder name follows the ISO 3166-1 alpha-2 two-character code. A separate .csv is available for each predictor data and crop calendar as shown below. The csv files are named to reflect the corresponding country and crop type e.g. variable_croptype_country.csv.```CY-Bench│└─── maize│ ││ └─── AO│ │ -- crop_calendar_maize_AO.csv│ │ -- fpar_maize_AO.csv│ │ -- meteo_maize_AO.csv│ │ -- ndvi_maize_AO.csv│ │ -- soil_maize_AO.csv│ │ -- soil_moisture_maize_AO.csv│ │ -- yield_maize_AO.csv│ ││ └─── AR│ -- crop_calendar_maize_AR.csv│ -- fpar_maize_AR.csv│ -- ...│ └─── wheat│ ││ └─── AR│ │ -- crop_calendar_wheat_AR.csv│ │ -- fpar_wheat_AR.csv│ │ ...``` Example : CSV data content for maize in country X ```X└─── crop_calendar_maize_X.csv│ -- crop_name (name of the crop)│ -- adm_id (unique identifier for a subnational unit)│ -- sos (start of crop season)│ -- eos (end of crop season)│ └─── fpar_maize_X.csv│ -- crop_name│ -- adm_id│ -- date (in the format YYYYMMdd)│ -- fpar│ └─── meteo_maize_X.csv│ -- crop_name│ -- adm_id│ -- date (in the format YYYYMMdd) │ -- tmin (minimum temperature)│ -- tmax (maximum temperature)│ -- prec (precipitation)│ -- rad (radiation)│ -- tavg (average temperature)│ -- et0 (evapotranspiration)│ -- cwb (crop water balance)│└─── ndvi_maize_X.csv│ -- crop_name│ -- adm_id│ -- date (in the format YYYYMMdd)│ -- ndvi │ └─── soil_maize_X.csv│ -- crop_name│ -- adm_id│ -- awc (available water capacity)│ -- bulk_density│ -- drainage_class│ └─── soil_moisture_maize_X.csv│ -- crop_name│ -- adm_id│ -- date (in the format YYYYMMdd)│ -- ssm (surface soil moisture)│ -- rsm ()│ └─── yield_maize_X.csv│ -- crop_name│ -- country_code│ -- adm_id│ -- harvest_year│ -- yield│ -- harvest_area│ -- production centroids.zip and polygons.zip include shapes or geometries as centroids ( x and y coordinates) and polygons (multipolygons) of administrative regions respectively. They are organized as follows: centroids │ └─── AO│ │ -- AO.cpg│ │ -- AO.dbf│ │ -- AO.prj│ │ -- AO.shp│ │ -- AO.shx│ └─── AR│ │ -- AR.cpg│ │ -- AR.dbf│ │ -- AR.prj│ │ -- AR.shp│ │ -- AR.shx ... polygons │ └─── AO│ │ -- AO.cpg│ │ -- AO.dbf│ │ -- AO.prj│ │ -- AO.shp│ │ -- AO.shx│ └─── AR│ │ -- AR.cpg│ │ -- AR.dbf│ │ -- AR.prj│ │ -- AR.shp│ │ -- AR.shx ... Data access The full dataset can be downloaded directly from Zenodo or using the ```zenodo_get``` library License and citation We kindly ask all users of CY-Bench to properly respect licensing and citation conditions of the datasets included. Version Notes 1.0 is the dataset submitted to NeurIPS Datasets and Benchmarks Track. The paper and discussions are here: https://openreview.net/forum?id=jkJDNG468g#discussion 1.1 and 1.2 fix some issues with column names and mismatches in adm_id between yield data and input data. 1.3 includes location information in the form of centroids and polygons of admin regions. 1.4 updates the fpar data for 2023. fpar data was incomplete for 2023 in earlier versions (due to unavailability in the data source itself). 1.5 fixes an issue in crop calendar 1.6 fixes an issue in ndvi time series 1.7 updates storage precision to 3 decimal places to reduce data size
This page was built for dataset: CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting