Skip to main content

New Melt Curve Machine Learning Method Enables Large Scale Genotyping of Sequence Variants

Premium

NEW YORK (GenomeWeb) — High-resolution melt curve analysis is a robust and simple way of querying the genotypes present in heterogeneous samples, since curve shapes and melting temperatures are related to nucleic acid sequence.

Now, researchers at Johns Hopkins and Stanford Universities have developed a method to pluck quantitative information about all sequence variants in a mixture. The method uses an algorithm relying on the iterative process of machine learning to automatically classify HRM curves, and can enable large-scale genotype assessment in unknown samples.

Published last week in PLoS One, the algorithm was tested in silico using nine simulated experimental conditions. It managed to classify 92 known serotypes of Streptococcus pneumoniae with 99 percent accuracy after eight training curves per serotype. The algorithm was then tested on six simulated methylation levels of RASSF1A, a tumor suppressor gene inactivated by hypermethlyation of a CpG island promoter. Here, the algorithm was 100 percent accurate at identifying all sequence variants in vitro after three training curves.

Samuel Yang, now an associate professor of surgery at Stanford University Medical Center, was one of the principal investigators on the study. "As a simple add-on to any qPCR instrumentation, this technology enables reliable direct sequence profiling, which has broad research, epidemiological, and clinical applications," he said in email to PCR Insider this week.

"Anything that requires reliable large-scale sequence profiling can use this algorithm with HRM," he added, noting that the method can reliably match unknown melt curves derived from PCR against a large reference database of curves in order to profile sequences.

Previously at JHU, Yang co-authored a study in October 2013 in Nucleic Acids Research on a technique called universal digital high-resolution melt, or U-dHRM.

According to that study, by partitioning sequence variants into droplets and measuring the melt curve on a per-droplet basis, Yang and colleagues could perform "absolute quantification and identification of numerous target genotypes, including discovery of unexpected or unknown species, in a heterogeneous sample," all using a generic florescent reporter.

Designing and testing the new algorithm took two years, Yang said this week, and required the expertise of a multidisciplinary team that included an emergency medicine clinician, a molecular microbiologist, biomedical engineers, and computer scientists.

Their labors hinged, in part, on finding the best machine learning paradigm to allow the computer to cluster and classify variants. They tested three others, but ultimately settled on support vector machine learning, or SVM.

Yang and colleagues also developed a primer selection algorithm, and chose primer pairs in conserved regions flanking small hypervariable regions, giving the maximum number of distinguishable sequences by BLASTClust analysis. In the case of S. pneumonia, seven primer pairs were needed to differentiate all 92 serotypes.

Importantly, the new algorithm also enabled learned tolerance for deviations in run-to-run reaction conditions.

There may be precedents for integrating this technology into the commercial space. For example, machine learning HRM "fingerprinting" is somewhat similar to analysis using PCR-electrospray ionization mass spectrometry — the basis of Abbot's Plex-ID platform — but the profiling technology is different, Yang said.

"Plex-ID uses mass spectrometry to profile PCR products, which is very costly and requires additional instrumentation," he said "HRM can be performed directly on most qPCR instruments in less than five minutes with the simple addition of reagents, [a] melt curve database, and our algorithm."

Abbott recently launched a next-generation of its Plex-ID platform, called Iridica, which is intended to identify pathogens in patient samples sent to hospital clinical labs. However, a 2012 study by JHU researchers showed that HRM compared favorably with Plex-ID in identifying pathogens from blood culture samples, as covered in PCR Insider.

"As PCR technology gets smaller, faster, and cheaper toward point-of-care use, we want to develop a complementary profiling technology that generates high-content sequence informatics without compromising speed, cost, or complexity," Yang said.

Intellectual property protection is already in place on the HRM algorithm, he said, and the group is "actively seeking commercialization partners."

The Scan

And For Adolescents

The US Food and Drug Administration has authorized the Pfizer-BioNTech SARS-CoV-2 vaccine for children between the ages of 12 and 15 years old.

Also of Concern to WHO

The Wall Street Journal reports that the World Health Organization has classified the SARS-CoV-2 variant B.1.617 as a "variant of concern."

Test for Them All

The New York Times reports on the development of combined tests for SARS-CoV-2 and other viruses like influenza.

PNAS Papers on Oral Microbiome Evolution, Snake Toxins, Transcription Factor Binding

In PNAS this week: evolution of oral microbiomes among hominids, comparative genomic analysis of snake toxins, and more.