Harmonized Tree Species Occurrence Points for Europe
DOI10.5281/zenodo.5524611Zenodo5524611MaRDI QIDQ6689019FDOQ6689019
Dataset published at Zenodo repository.
Johannes Heisig, Tomislav Hengl
Publication date: 1 October 2020
Copyright license: Creative Commons Attribution 4.0 International
This data set is a harmonized collection of existing data from GBIF, the EU-Forest project and the LUCAS survey. It has about 3 million observations and is supplemented by variables (e.g. location accuracy, land cover type, canopy height, etc.) which enable precise filtering for specific user applications. The RDS file is created from an sf-object and suitable for fast reading in the R-programming environment. The CSV.GZ file contains records as a table with Easting and Northing in Coordinate Reference System ETRS89 / LAEA Europe (= EPSG code 3035) and can be fed in a GIS after being unzipped. The code producing this data set is publicly available on GitLab. Data sets were last updated in September 2021. Variables: id = unique point identifier easting = x coordinate northing = y coordinate country = ISO country code species = Latin species name genus = genus name scientific_name = long species name gbif_taxon_key = taxon key from GBIF gbif_genus_key = genus key from GBIF taxon_rank = species or genus year = year of observation accessed_through = database through which data was accessed (GBIF, LUCAS, EU-Forest) dataset_info = data set name (individual sub-data-set) citation = DOI citation of the individual data set license = distribution license location_accuracy = spatial accuracy of observation (meters) flag_location_issue = known location issues present flag_date_issue = known date issues present eoo = Extent of occurrence (applying the concept of natural geographical range used for the EU-Forest data set (Mauri et al., 2017) to all other data points. 1 = point inside species range; 0 = point outside; NA = EOO polygon not available for this species) dbh = Diameter Breast Height (only recorded for observations from the EU-Forest data set (Mauri et al., 2017)) lc1 = LUCAS land cover type 1 (only recorded for observations from LUCAS data) lc2 = LUCAS land cover type 2 (only recorded for observations from LUCAS data) landmask_country = land mask overlay 30 meters (NA = not on land) corine = CORINE 2018 land cover type (extracted from the 100 meter raster data set) nightlights = light pollution observed by VIIRS (proxy for remoteness / distance to human structures) canopy_height = canopy height derived from GEDI waveform LiDAR point data natura_2000 = Natura 2000 site code (if a point falls inside a protected area (GIS-layer) this variable contains the site identification code; all sites can be explored on an interactive map) freq_location = number of points with identical location (in some cases one location has multiple observation, differing in species and/or year. This may lead to difficulties in certain modeling tasks) geometry = point geometry in ETRS89 / LAEA Europe See this detailed documentation for more insights into each variable and individual GBIF data set citations. If you would like to know more about the creation of this data set, see the R-Markdown documenting the process (GitLab repository) the talk at OpenGeoHub Summer School 2020 (Youtube) Some advice: This data set is a puzzle with pieces from many different sources. Take some time to explore before including it in your work. Use summary statistics to see which variables have NAs and how many. Choose your filtering criteria wisely. For example, some points with the highest location accuracy have no record for the year of observations. You would exclude these, if year 1990 was your criteria. This work has received funding from the European Unions the Innovation and Networks Executive Agency (INEA) under Grant Agreement Connecting Europe Facility (CEF) Telecom project 2018-EU-IA-0095 (https://ec.europa.eu/inea/en/connecting-europe-facility/cef-telecom/2018-eu-ia-0095).
This page was built for dataset: Harmonized Tree Species Occurrence Points for Europe