RNAgrail dataset and model

From MaRDI portal
Dataset:6699042



DOI10.5281/zenodo.13757098Zenodo13757098MaRDI QIDQ6699042FDOQ6699042

Dataset published at Zenodo repository.

Marta Szachniuk, Maciej Antczak, Justyna Marek, Craig Zirbel

Publication date: 13 September 2024

Copyright license: Creative Commons Attribution 4.0 International



RNAgrail: RNA 3D Structure Prediction Dataset and Model This repository contains the datasets and the pre-trained model associated with RNAgrail, a diffusion-based graph neural network for RNA 3D structure prediction. The data is organized into multiple files, each providing key resources for training, validation, and testing the model, as well as a pre-trained model ready for inference. Data Overview: rRNA_tRNA.tar.gz: Contains raw PDB files with extracted descriptors from ribosomal RNA (rRNA) and transfer RNA (tRNA) structures. non_rRNA_tRNA.tar.gz: Contains raw PDB files with extracted descriptors from RNA molecules that are non-rRNA and non-tRNA. These serve as a separate test set. train-pkl.tar.gz: Contains the filtered and preprocessed pickle files for the training set, derived from the rRNA_tRNA dataset. These files are used to train RNAgrail. val-pkl.tar.gz: Contains the validation set, which is a subset of the training data from train-pkl.tar.gz. test-pkl.tar.gz: Contains the preprocessed pickle files for the test set, derived from the non_rRNA_tRNA dataset. This set includes RNA descriptors that are not rRNA or tRNA, providing a challenging test scenario. model_epoch_800.tar.gz: Contains the pre-trained RNAgrail model after 800 epochs of training on the train-pkl dataset. This model is ready for inference and evaluation. all-outputs.txt: Contains basic metadata about all descriptors: name of file, number of segments, number of nucleotides, sequence of each segment, and positions of segments in original PDB files. Use of Data and Model: The raw PDB files can be used for RNA descriptor extraction, while the pickle files are preprocessed for direct use in training, validation, and testing workflows. The RNAgrail model in model_epoch_800.tar.gz can be used to run inference on new RNA data or to reproduce results from the associated paper. How to Use: Training: The train-pkl.tar.gz contains data that can be used to retrain the RNAgrail model from scratch. Validation: The val-pkl.tar.gz can be used to validate the model during or after training. Testing: Use the test-pkl.tar.gz to evaluate the model's performance on RNA types that it wasn't trained on (non-rRNA and non-tRNA). Inference: The model_epoch_800.tar.gz is ready for inference on new RNA sequences. Acknowledgments: If you use this dataset or the pre-trained model in your research, please cite the associated paper (linked here once published). Thank you for your interest in RNAgrail!







This page was built for dataset: RNAgrail dataset and model