Skip to main content
Premium Trial:

Request an Annual Quote

Long-Read Sequencing Helps Detect Protein Isoforms Predicted by RNA


NEW YORK – Long-read sequencing of RNA isoforms can help researchers use mass spectrometry to discover novel peptides of interest in the same samples, a new study suggests.

Using full-length RNA sequencing from Pacific Biosciences, researchers led by the University of Virginia School of Medicine's Gloria Sheynkman were able to better target certain protein isoforms in a stem cell line.

The study is merely proof of concept, Sheynkman said, but shows how pairing the two data types can reveal a previously unseen world of protein complexity. Many of the RNA isoforms were found to have been translated into protein. "We validated most of them, even many that people thought were suspect," she said.

Her lab described the method in a BioRxiv preprint posted earlier this month. They used the RNA data to predict protein isoforms that were then detected using internal standard parallel reaction monitoring (IS-PRM), a type of targeted mass spec. They also used synthetic peptides based on the predictions as spike-ins to help detect the peptides in the sample that had been translated from those RNA.

These spike-ins "increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms," the authors wrote. "Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms."

The study demonstrates that targeted mass spec "can uniquely enable us to bridge the gap in proteogenomics," Qing Yu, a researcher at the UMass-Chan Medical School who helped develop the Tomahto method used in the paper but who was not involved in the study, said in an email. "These peptides from protein isoforms usually exist in extremely low abundance and are oftentimes challenging to characterize using traditional proteomics approaches. However, they are critical to understanding protein functions and elucidating dysregulated signaling pathways in diseases."

Identifying protein isoforms from full-length RNA "is a golden application of targeted mass spectrometry, allowing biologists to begin to understand which variants are ultimately expressed," Jeff Whiteaker, a clinical mass spec expert at the University of Washington who was not involved in the study, said in an email. "This will help lead to a better understanding of cancer biology, potentially improving biomarkers and identifying neoantigens."

Sheynkman's method is part of a movement in proteomics to deal with the deluge of data that comes off a mass spectrometer. "If you take the whole human proteome and digest it, you'd get 2 or 3 million peptides," Sheynkman said. "The peptides we want are less than 1 percent of that, but if we can detect those, it gives us a lot of information."

She compared the standard mass spec process to sitting by the radio and waiting for your favorite songs to come up. "Now, instead of waiting, we can get it from Spotify."

The approach also makes use of recent advances in long-read RNA sequencing library preparation. Her study used the PacBio Kinnex kit, a product that optimizes long-read sequencing efficiency of RNAs that are longer than reads afforded by short-read methods but too short to otherwise be efficiently analyzed on one of the firm's machines.

Sheynkman noted that long reads from Oxford Nanopore Technologies could fit into the data analysis pipeline, and even short-read data could be informative, but the accuracy of PacBio's HiFi sequencing, along with falling long-read costs and read length, made it an attractive choice.

IS-PRM could help fill in the "big gap between all the isoforms that RNA-seq asserts could be there and what is actually expressed," said Neil Kelleher, a researcher at Northwestern University who was not involved with the study but who is part of the Consortium for Top-Down Proteomics with Sheynkman. The Clinical Proteomic Tumor Analysis Consortium "took a giant run at this question eight years ago. … They used the best, deepest proteomics they could and couldn't really come out with the protein isoforms," he said. "But [Sheynkman's] doing it. She has kind of moved the needle here."

In essence, her method generates a sample-specific reference database for protein isoforms, or proteoforms. This can be done in as little as a week "if the stars align," she said. Generating the target list is "trivial" once you have the RNA data, but making the synthetic spike-in peptides to help find them is expensive and time-consuming, Yu said.

He noted that he has developed an alternative targeted mass spec method, called GoDig, that enables targeting of any peptides without standards. "It could be adopted along with predicted mass spectra to obviate the need to make synthetic internal standard peptides," he said. "In theory, GoDig can target any novel peptides predicted based on transcriptomics. I would love to see it applied to this type of study."

In the short term, applications include discovering new peptides and validating proteoforms and annotating them, but Sheynkman's long-term goal is to take this to patients.

"We could call it a personalized proteome," she said, referring to the target list generated from RNA data. "And then from those predictions, we'd find many novel protein isoforms that may be specific to that sample's specific biology or, eventually, a disease subtype."

Yu suggested this approach could also "provide an opportunity for the community to develop isoform-specific therapeutics."

For now, Sheynkman's lab is trying to determine which isoforms might be relevant for a disease. "A lot of the peptides and thus protein forms that get reported in literature or that are the focus of existing assays, they won the mass spec lottery," she said. By chance, they're detectable because they ionize or travel well in the mass spectrometer, "but that doesn't really have anything to do with maybe the disease or the importance of that protein."

"One way to change that is to really target the peptides that may not be as detectable, and we certainly found cases where the peptide was not sampled at all when you do a shotgun approach, but then when we target that peptide, we confirmed it is there," said Sheynkman.

Assays that quantify those peptides could each be turned into a US Food and Drug Administration-approved test, she said, but her great hope is to develop targeted panels for isoform-specific biomarkers.

And "top-down" proteomics methods, such as protein sequencing, could be an even better partner to long reads then mass spec, Kelleher said. "If both were deep technologies, that would give you the best understanding of how information flows from DNA to RNA to protein."