TEST DATA for Enhanced protein isoform characterization through long-read proteogenomics

DOI10.5281/zenodo.5234651Zenodo5234651MaRDI QIDQ6724591FDOQ6724591

Dataset published at Zenodo repository.

Erin Jeffery, Lloyd Smith, Simone Tiberi, Anne Deslattes Mays, Ben Jordan, Robert Millikin, Rachel Miller, Simran Kaur, Michael Shortreed, Gloria Sheynkman, Ana Conesa, Madison Mehlferber, Christina Chatzipantsiou

Publication date: 8 July 2021

Copyright license: Creative Commons Attribution 4.0 International

Description

Test data forThe detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g. PacBio, Oxford Nanopore) provides full-length transcript sequencing, which can be used to predict full-length proteins. Here, we describe a long-read proteogenomics approach for integrating matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data in protein inference to enable detection of protein isoforms that are intractable to MS detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. Companion Repositories: Long-Read-Proteogenomics Workflow GitHub Repository Release Long-Read-Proteogenomics Analysis GitHub Repository Release Companion Datasets Jurkat Samples and Reference Data Long-Read-Proteogenomics Workflow Results using Jurkat Sample data This Repository contains the test data, specifically: TEST Data for Long-Read-Proteogenomics Workflow GitHub Actions

This page was built for dataset: TEST DATA for Enhanced protein isoform characterization through long-read proteogenomics