Simulated wastewater sequencing data for benchmarking SARS-CoV-2 variant abundance estimation
DOI10.5281/zenodo.5307070Zenodo5307070MaRDI QIDQ6698994FDOQ6698994
Dataset published at Zenodo repository.
Jordan Peccia, Jasmijn A. Baaijens, Rebecca Littlefield, Chaney C. Kalinich, Kyle McElroy, Mallery I. Breban, Joseph R. Fauver, Birgitte B. Simen, William P. Hanage, Isabel M. Ott, Nathan D. Grubaugh, Yale Sars-Cov-2 Genomic Surveillance Initiative, Claire Duvallet, Alex Plocik, Tara Alpert, Michael Baym, Alessandro Zulli, Malaika McKenzie-Bennett, Mary E. Petrone, Rebecca Schilling, Maxim Imakaev, Keith Robison, Martha Pierson, Michelle Spencer, Newsha Ghaeli, Chantal B.f. Vogels
Publication date: 29 August 2021
Copyright license: Creative Commons Attribution 4.0 International
To evaluate the accuracy of variant abundancepredictions from wastewater sequencing, we built a collection of benchmarking datasets that resemble real wastewater samples. For each variant (B.1.1.7, B.1.351, B.1.427, B.1.429, P.1) we created a series of 33 benchmarks by simulating sequencing reads from a variant genome, as well as a collection of background (non-variant of concern/interest) sequences, such that the variant abundance ranges from 0.05% to 100%. Analogously, we created a second series of benchmarks, simulating reads only from the Spike gene of each SARS-CoV-2 genome. We refer to the first set of benchmarks as whole genome (WG)and to the second set of benchmarks as S-only. We repeated these simulations at different sequencing depths: 100x and 1000x coverage for the whole genome benchmarks, and 100x, 1000x, and 10,000x coverage for the S-only benchmarks.
This page was built for dataset: Simulated wastewater sequencing data for benchmarking SARS-CoV-2 variant abundance estimation