Script and data from: The best of two worlds: toward large-scale monitoring of biodiversity combining metabarcoding and optimised parataxonomic validation.

From MaRDI portal
Dataset:6691588



DOI10.5281/zenodo.13760159Zenodo13760159MaRDI QIDQ6691588FDOQ6691588

Dataset published at Zenodo repository.

Axel Bourdonnée, Julien Haran, Gael Kergoat, Alain Migeon, Marie-Pierre Chapuis, Anne-Laure Clamens, Benoit Penel, Laure Benoit, Christine Meynard, Sylvain Piry, Laurent Soldati

Publication date: 13 September 2024



Description Publication abstract In a context of unprecedented insect decline, it is critical to have reliable monitoring tools to measure species diversity and their dynamic at large-scales. High-throughput DNA-based identification methods, and particularly metabarcoding, were proposed as an effective way to reach this aim. However, these identification methods are subject to multiple technical limitations, resulting in unavoidable false-positive and false-negative species detection. Moreover, metabarcoding does not allow a reliable estimation of species abundance in a given sample, which is key to document and detect population declines or range shifts at large scales. To overcome these obstacles, we propose here a Human-Assisted Molecular Identification (HAMI) approach, a framework based on a combination of metabarcoding and image-based parataxonomic validation of outputs and recording of abundance. We assessed the advantages of using HAMI over the exclusive use of a metabarcoding approach by examining 492 mixed beetle samples from a biodiversity monitoring initiative conducted throughout France. On average, 23% of the species are missed when relying exclusively on metabarcoding, this percent being consistently higher in species-rich samples. Importantly, on average, 20% of the species identified by molecular-only approaches correspond to false positives linked to cross-sample contaminations or mis-identified barcode sequences in databases. The combination of molecular methodologies and parataxonomic validation in HAMI significantly reduces the intrinsic biases of metabarcoding and recovers reliable abundance data. This approach also enables users to engage in a virtuous circle of database improvement through the identification of specimens associated with missing or incorrectly assigned barcodes. As such, HAMI fills an important gap in the toolbox available for fast and reliable biodiversity monitoring at large scales. File description: MiSeq raw sequences of the COI barcode from 492 Coleoptera field samples : The Raw_sequencage_data ZIP directory contains the FASTQ files of the paired-end reads (R1: reads 1; R2: reads 2) produced for each Coleoptera field samples in duplicate using the MiSeq platform GenSeq (ISEM - University of Montpellier) The HAMI_data_script_results_R zip directory contains Rmarkdown script files (.Rmd and .html) and associated data used to analyse the systemic errors of the metabarcoding approach (N= 492 Coleoptera field samples). The HAMI_pipeline zip directory contains all the codes associated with the HAMI pipeline, as well as a ReadMe file and a test data set. The Residual_chimera.zip directory contains lists of MOTUs associated to residual chimeric sequences that were not filtered using FROGS pipeline but secondarily detected with thede novoapproach implemented in HAMI pipeline with isBimeraDenovo R function from DADA2 v1.28.0. It contains two distinct files according to the two sequencing runs. The NUMTS_filtered.zip directory contains lists of MOTUs that were excluded of the final dataset according to the NUMTS filtering. File xxx_pseudogene_f1_deteled.csv corresponds to MOTUs that were excluded according to the first filtrering step based on DNA sequencing. File xxx_pseudogene_f2_deteled.csv corresponds to merged MOTUs that were excluded according to the second filter based on occurrence and percentage of identity. This folder contains files for the two sequencing runs.







This page was built for dataset: Script and data from: The best of two worlds: toward large-scale monitoring of biodiversity combining metabarcoding and optimised parataxonomic validation.