RDP Classifier training files for 16S rRNA sequences from GTDB
DOI10.5281/zenodo.12703477Zenodo12703477MaRDI QIDQ6683339FDOQ6683339
Dataset published at Zenodo repository.
Publication date: 10 July 2024
Copyright license: Creative Commons Attribution 4.0 International
16S rRNA gene sequences from the Genome Taxonomy Database (GTDB release 220) were used to retrain the RDP Classifier (version 2.13). Two sets of training files are provided: genus.zip - Genus level species.zip - Species level The code in prepare_files.R was used to prepare the GTDB sequence and taxonomy files for retraining the RDP Classifier. Notes: Steps to retrain the RDP Classifier are adapted from https://john-quensen.com/tutorials/training-the-rdp-classifier/ Python scripts (lineage2taxTrain.py and addFullLineage.py) are available at https://github.com/rdpstaff/classifier/issues/18 The first 1000 training sequences (train_nodups_1000.fasta) are used for benchmarking the classification accuracy (see results at end of prepare_files.R).
This page was built for dataset: RDP Classifier training files for 16S rRNA sequences from GTDB