RDP Classifier training files for 16S rRNA sequences from GTDB

From MaRDI portal
Dataset:6683339



DOI10.5281/zenodo.12703477Zenodo12703477MaRDI QIDQ6683339FDOQ6683339

Dataset published at Zenodo repository.

Jeffrey M. Dick

Publication date: 10 July 2024

Copyright license: Creative Commons Attribution 4.0 International



16S rRNA gene sequences from the Genome Taxonomy Database (GTDB release 220) were used to retrain the RDP Classifier (version 2.13). Two sets of training files are provided: genus.zip - Genus level species.zip - Species level The code in prepare_files.R was used to prepare the GTDB sequence and taxonomy files for retraining the RDP Classifier. Notes: Steps to retrain the RDP Classifier are adapted from https://john-quensen.com/tutorials/training-the-rdp-classifier/ Python scripts (lineage2taxTrain.py and addFullLineage.py) are available at https://github.com/rdpstaff/classifier/issues/18 The first 1000 training sequences (train_nodups_1000.fasta) are used for benchmarking the classification accuracy (see results at end of prepare_files.R).







This page was built for dataset: RDP Classifier training files for 16S rRNA sequences from GTDB