Skip to main content
Premium Trial:

Request an Annual Quote

Boston Team Applies Polysolver Method to Detect HLA Mutations in Whole-Exome Sequencing Data

Premium

NEW YORK (GenomeWeb) – A team of Boston-area researchers has validated a novel analysis tool called Polysolver — one of several recent methods for extrapolating HLA type from whole-exome sequencing data.

In a study published last week in Nature Biotechnology, the team, led by researchers from Dana Farber Cancer Institute and the Broad Institute, used Polysolver to detect numerous somatic mutations in the HLA gene in whole-exome sequencing data from almost 8,000 tumor and paired normal samples.

Though there were discordances between the Polysolver-derived mutations and a set of alterations from a previous analysis by TGCA, various validation steps upheld most of the Polysolver calls and showed that the method could be tweaked to pick up many TCGA-derived mutations that were initially missed.

Dana Farber researcher Catherine Wu, who co-led the team with the Broad's Gad Getz, told GenomeWeb that while there are various strategies used to independently deep sequence the HLA regions, the genomics field has lacked tools that could accurately determine HLA type and search for mutations in the relatively shallower data from standard whole exome sequencing.

"There are just thousands and thousands of exomes that are out there and until this method [and others like it] it was just not well accessed," Wu explained.

Researchers can currently use tools like targeted deep sequencing to obtain the desired information, "but with your standard short-read, low-to-moderate-coverage, what can you do then?" Wu said. "We just didn't have an approach to address this."

Since the time the team first started working on Polysolver, Wu added, several other HLA typing algorithms have been developed that perform just about as well.

But, the Broad team has demonstrated the sensitivity of its tool in a relatively large group of samples for not only the imputation of HLA type, but also the detection of mutations in the HLA gene region.

According to Wu and her coauthors, Polysolver-based HLA typing involves two basic steps. First, the tool retrieves reads from the WES data that potentially originate from the HLA region based on matching to a tag library derived from all known HLA alleles.

These isolated reads are then aligned to a full-length genomic library of all known HLA alleles and only the best-scoring alignments are kept for use in subsequent steps.

After this, a two-step Bayesian classification approach is used to infer the HLA allele, taking into account the base qualities of aligned reads, observed insert sizes, and the ethnicity-dependent prior probabilities of each allele, according to the authors.

To identify mutations, the HLA allele, as determined by Polysolver, is then used as a reference, so that the germline and tumor sequence results can be compared, revealing cancer-specific mutations in the HLA gene.

In their study published last week, Wu and her colleagues applied Polysolver to a set of WES data from a total of 7,930 tumor-normal pairs, after  initially validating the approach in 253 HapMap samples with known HLA genotypes.

In the HapMap analysis, Polysolver demonstrated a mean sensitivity of 97 percent, a mean precision of 98.8 percent, and a mean overall accuracy of 97 percent in correctly calling all alleles in a sample. In addition, the method was 100 percent successful in calling homozygous cases.

Compared to a set of other HLA inference algorithms, Polysolver outperformed four earlier-generation approaches, the authors reported, but appeared equivalent, as Wu noted, to at least one newer method called OptiType, developed by researchers at the University of Tübingen in Germany.

After their initial validation work, the Dana Farber/Broad scientists set out to also apply the method to the detection of HLA mutations. To test this, they first assembled a dataset of 2,545 cases of matched tumor and germline DNA from 12 tumor types — ten from TCGA, and another two from separate genomic studies of chronic lymphocytic leukemia and melanoma.

From these samples, 59 somatic HLA mutations had already been reported using standard genomic analysis as part of the pan-cancer TCGA effort.

When the team reanalyzed the data from these same cases using Polysover, they detected 36 of the 59 previously reported mutations, as well as 37 novel somatic HLA alterations.

Addressing first the previously noted mutations that were not identified by Polysolver, the group found that only nine of the 23 appeared to be true events, and six of these looked to be just below the detection limit of the Polysolver pipeline. This meant that with slightly relaxed filtering criteria these mutations could be pushed above that threshold, the authors reported.

The investigators also used RNA-seq data and targeted HLA sequencing to further explore the discordances between the initial TCGA findings and their own results using Polysolver.

Checking against RNA-seq data, which was available for 49 of the 96 mutations in question, the group was able to validate the majority of the mutations identified exclusively by Polysolver, but only two of 10 that were unique to the initial TCGA analysis.

Finally, using direct targeted HLA sequencing of 12 TCGA samples, the team also confirmed 11 mutations inferred by Polysolver, six of which were not picked up by the initial TCGA analysis.

Overall, the authors wrote, the HLA mutational spectrum elucidated by their Polysolver analysis "significantly reduced false positives and detect[ed] additional somatic mutations" in comparison with previous studies.

To look more broadly at patterns of HLA mutation, the researchers eventually expanded their analysis to a total of 7,930 TCGA tumor-normal pairs covering 20 tumor types, in which they detected a total of 298 somatic HLA mutations amongst the data from 266 individuals (about 3 percent of the cohort).

Among the biological insights gleaned from this data was the presence of several recurrent mutation sites or hotspots — 29 sites that were recurrently mutated in at least three cases, and 35 sites mutated in at least two cases — which suggested positive selection at these positions.

The group also saw that colon adenocarcinoma was significantly affected by somatic mutations in class I HLA genes, as has been found in other tumors like head and heck, lung, and stomach cancer.

In contrast, other tumor types like glioblastoma, ovarian, and CLL largely lacked mutations in the HLA gene region according to the Polysolver analysis.

Wu said the group could not provide details about its ongoing applications of Polysolver, but the study's authors noted that the method could be useful not only for mutation detection in HLA, but also for other polymorphic loci or areas of the genome that pose a particular difficulty for mutation detection in the context of shallower whole-exome sequencing.

"Frankly, there are many other loci in the genome that in general we turn a blind eye to because they are problem children," Wu said. "This notion of creating precise alignment — an inference method to figure out the alleles — is a framework that is applicable to other highly polymorphic regions."