RNAgrail dataset and model (Q6699042)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: RNAgrail dataset and model |
Dataset published at Zenodo repository.
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | RNAgrail dataset and model |
Dataset published at Zenodo repository. |
Statements
RNAgrail: RNA 3D Structure Prediction Dataset and Model This repository contains the datasets and the pre-trained model associated with RNAgrail, a diffusion-based graph neural network for RNA 3D structure prediction. The data is organized into multiple files, each providing key resources for training, validation, and testing the model, as well as a pre-trained model ready for inference. Data Overview: rRNA_tRNA.tar.gz: Contains raw PDB files with extracted descriptors from ribosomal RNA (rRNA) and transfer RNA (tRNA) structures. non_rRNA_tRNA.tar.gz: Contains raw PDB files with extracted descriptors from RNA molecules that are non-rRNA and non-tRNA. These serve as a separate test set. train-pkl.tar.gz: Contains the filtered and preprocessed pickle files for the training set, derived from the rRNA_tRNA dataset. These files are used to train RNAgrail. val-pkl.tar.gz: Contains the validation set, which is a subset of the training data from train-pkl.tar.gz. test-pkl.tar.gz: Contains the preprocessed pickle files for the test set, derived from the non_rRNA_tRNA dataset. This set includes RNA descriptors that are not rRNA or tRNA, providing a challenging test scenario. model_epoch_800.tar.gz: Contains the pre-trained RNAgrail model after 800 epochs of training on the train-pkl dataset. This model is ready for inference and evaluation. all-outputs.txt: Contains basic metadata about all descriptors: name of file, number of segments, number of nucleotides, sequence of each segment, and positions of segments in original PDB files. Use of Data and Model: The raw PDB files can be used for RNA descriptor extraction, while the pickle files are preprocessed for direct use in training, validation, and testing workflows. The RNAgrail model in model_epoch_800.tar.gz can be used to run inference on new RNA data or to reproduce results from the associated paper. How to Use: Training: The train-pkl.tar.gz contains data that can be used to retrain the RNAgrail model from scratch. Validation: The val-pkl.tar.gz can be used to validate the model during or after training. Testing: Use the test-pkl.tar.gz to evaluate the model's performance on RNA types that it wasn't trained on (non-rRNA and non-tRNA). Inference: The model_epoch_800.tar.gz is ready for inference on new RNA sequences. Acknowledgments: If you use this dataset or the pre-trained model in your research, please cite the associated paper (linked here once published). Thank you for your interest in RNAgrail!
0 references
13 September 2024
0 references