Continental Europe land cover mapping at 30m resolution based CORINE and LUCAS on samples

DOI10.5281/zenodo.4725429Zenodo4725429MaRDI QIDQ6689012FDOQ6689012

Dataset published at Zenodo repository.

Martijn Witjes, Martin Landa, Leandro Parente, Tomislav Hengl, Lukas Brodsky

Publication date: 1 March 2021

Copyright license: Creative Commons Attribution 4.0 International

Annual land cover mapping for continental Europe based on Ensemble Machine Learning (EML),samples obtained from LUCAS (Land Use and Coverage Area frame Survey) and CLC (CORINE Land Cover) Maps, and several harmonized raster layers (e.g.GLAD Landsat ARD imagery and Continental EU DTM). The EML predicted the dominant land cover, probabilities and uncertainties for 33 classes compatible with CLC over 20 years (20002019), and was implemented in R and Python (eumap library). The raster layers were mainly composed by the GLAD Landsat ARD imagery, whichwere downloaded for the years 1999 to 2020 considering the Continental Europeextent (land mask area and tiling system),screened to reduce cloud cover (GLAD quality assessment band),aggregated by season according with three different quantiles (i.e. 25th, 50th and 75th), and gap-filled using the Temporal Moving Window Median approachavailable in the eumap library. The images for each season were selected using the same calendar dates for all period: Winter: December 2 of previous year until March 20 of current year Spring: March 21 until June 24 of current year Summer: June 25 until September 12 of current year Fall: September 13 until December 1 of current year In addition to Landsat spectral data, the EMLconsiderednight lights (VIIRS/SUOMI NPP), Global surface water frequency,Continental EU DTM, Landsat spectral indices (SAVI, NDVI, NBR, NBR2, REI and NDWI) and the max/min. monthly geometric temperature, estimated on a pixel basisand for each month. The training data were obtained from the geographic location of LUCAS (in-situ source) and the centroid of all polygons of CORINE (supplementary source), harmonized according to the 33 CLC andorganized by year, where each unique combination of longitude, latitude and year was treated as a independent sample with the following classes (the class descriptions are here): 111: Urban fabric 122: Road and rail networks and associated land 123: Port areas 124: Airports 131: Mineral extraction sites 132: Dump sites 133: Construction sites 141: Green urban areas 211: Non-irrigated arable land 212: Permanently irrigated arable land 213: Rice fields 221: Vineyards 222: Fruit trees and berry plantations 223: Olive groves 231: Pastures 311: Broad-leaved forest 312: Coniferous forest 321: Natural grasslands 322: Moors and heathland 323: Sclerophyllous vegetation 324: Transitional woodland-shrub 331: Beaches, dunes, sands 332: Bare rocks 333: Sparsely vegetated areas 334: Burnt areas 335: Glaciers and perpetual snow 411: Inland wetlands 421: Maritime wetlands 511: Water courses 512: Water bodies 521: Coastal lagoons 522: Estuaries 523: Sea and ocean The LUCAS points with a unique land cover class received a confidence rating of 100%, while CORINE points received 85%, values which wereconsidered by EML as sample weight in the training phase. The points were used in a spacetime overlay approach, which considered the location and the year to retrieve the pixel values of all rasters.Some specific land cover samples (i.e. 111, 122, 131, 141, 211, 221, 222, 223, 231, 311, 312, 321, 411, 512) were screened according to convergence with pre-existing mapping products (OSM roads, OSMrailways and Copernicus-OSM buildings; Copernicus high resolution layers), where, for example, 111: Urban fabric samples located in low density building areas ( 50% according to Copernicus-OSM building layer) were removed from the final training data (~5.3 million samples and 178 covariates/features). Using this training data, three ML models were trained to predict probabilities (i.e. Random Forest,XGBoost,Artificial Neural Network), which served as input to train a linear meta-model (i.e. Logistic regression classifier), responsable for predicting the final land cover probabilities of allclasses. The hyperparameter optimization was conducted usinga 5-fold spatial cross validation, based on a 30x30km tilling system. The uncertainties were calculated for all classes according to the standard deviation of the three predicted probabilities for each pixel,and the highest probability was selected as the dominant land cover class, resulting in 20 annual maps for continental Europe. The training samples,covariates/features and fitted models are available through lcv_landcover.hcl_lucas.corine.eml_p_landmapper_full.lz4, a LandMapper class instance that can be loaded by eumap library(check the code demonstration). The production code used to generate the current version of the annual land cover maps is available in the spatial layer repositoryand consideredalighterLandMapper class instance (lcv_landcover.hcl_lucas.corine.eml_p_landmapper_light.lz4,), whichnot includes the training samples. Only the dominant land cover classes are providedhere. To access the probabilities and uncertainties use: Open Data Science Europe viewer: https://maps.opendatascience.eu S3 Cloud Object Service:https://medium.com/swlh/europe-from-above-space-time-machine-learning-reveals-our-changing-environment-1b05cb7be520 A publication describing, in detail, all processing steps, accuracy assessment and general analysis of land-cover changes in continental Europe is under preparation. To suggest any improvement/fixusehttps://gitlab.com/geoharmonizer_inea/spatial-layers/-/issues

This page was built for dataset: Continental Europe land cover mapping at 30m resolution based CORINE and LUCAS on samples