Skip to main content
Premium Trial:

Request an Annual Quote

Johns Hopkins Team Debuts Method for Genome-Wide Detection of Cancer-Linked Repeat Sequences


NEW YORK – Investigators at Johns Hopkins University have developed a tool for detecting changes in repetitive sequences from whole-genome sequencing and shared initial data on how it could be used to detect early-stage cancer in blood.

Although alterations of repeat sequences have been implicated in cancer before, studies of these genetic changes had previously been limited to specific sequences, or classes of repeats said Victor Velculescu, the study's senior author and codirector of cancer genetics and epigenetics at Johns Hopkins.

For example, researchers from the Translational Genomics Institute and City of Hope are advancing a liquid biopsy method called A-PLUS, which targets a subset of repetitive regions called Alu elements, with positive early results in multi-cancer detection.

The new method, dubbed ARTEMIS (analysis of repeat elements in disease), for the first time enables the examination of the much larger population of repeat sites, where there could be even more cancer-linked signals, Velculescu said.

"Individually, certain elements have been shown to be very interesting biomarkers, but we won't know what other interesting ones there are unless we look at all of them," added Akshaya Annapragada, the study's first author and an M.D./Ph.D. student at Johns Hopkins.

In their study, published in Science Translational Medicine on Wednesday, Annapragada, Velculescu, and colleagues used ARTEMIS to explore this broader landscape of repeats, identifying widespread changes in repeat landscapes of human cancers, including specific repeat elements not previously implicated in the disease.

"For the first time, you can find all the parts of the genome that are affected through changes in repeats, either because there's an increase or decrease in the repeat elements, and that just has not been possible before," Velculescu said. "It's like saying, we can sequence a gene, and then somebody else says, well, we just sequenced the genome. It's a very different sort of level of understanding."

To develop the ARTEMIS method, the Hopkins team started from scratch, searching the recently updated T2T reference genome for kmers, or short sequences that were specific to one of 1,266 recently identified repeat types. On average, each of these repeat types were defined by around 43,000 24-base pair kmers spanning an average of 2.6 Mb of genome sequence.

Another 58,000 24-bp kmers were derived from enhanced annotations of 14 human satellite subtypes, allowing for coverage of genome regions that could not be aligned with high quality in typical short-read next-generation sequencing. Further analyses confirmed that the collection of kmers was not confounded by microbial DNA.

When the investigators looked at the genome-wide distribution of the 1.2 billion repeat-defining kmers in ARTEMIS, they found, as expected, that repeat elements tended to be enriched in genes commonly altered in human cancer. Of the 736 genes in the COSMIC cancer driver gene census, 487 had a higher-than-expected number of repeat kmer sequences within their exonic or intronic sequences. Repeat sequences were also more frequent in pathways commonly dysregulated in cancer, including cell adhesion, growth, and signaling, as well as gene sets specific to cancer type, the authors reported.

Having established ARTEMIS's library of kmers, the team then set out to analyze a set of matched tumor and normal tissue samples from 525 patients with cancer, including breast, lung, colorectal, liver, thyroid, head and neck squamous cell, ovarian, gastric, bladder, cervical, and prostate tumors. According to the authors, the tumor samples featured 246 to 1,280 repeat elements with higher or lower kmer counts compared to their normal matches. Among the 1,280 total repeat elements, nearly two-thirds were not previously known as cancer biomarkers.

"Changes in kmer repeat landscapes were highly complex, with no two patients studied having the same set of alterations," the authors wrote.

Using machine learning, the Hopkins team generated an "ARTEMIS score" for each sample — a single number representing a quantitative summary of genome-wide repeat element changes. Despite germline variability of repeat elements among different individuals, the authors wrote that these ARTEMIS scores were able to distinguish the 525 tumor tissue samples with high performance, represented by an area under the receiver operating curve (AUC) of 0.96.

Notably, ARTEMIS also demonstrated an ability to distinguish different tissue types, or tissue of origin for the cancers tested, with up to 83 percent accuracy.

Finally, the investigators tested the method on liquid biopsy samples from two clinical cohorts — one in lung cancer and the other in liver cancer — that they had previously tested with a fragmentomic classifier that is currently being commercialized by Velculescu and colleagues through their spinout company Delfi.

According to the authors, ARTEMIS demonstrated an AUC of 0.82 in distinguishing lung cancer. When combined with the DELFI fragmentomic approach, the joint AUC was boosted to 0.91. Much the same was seen in the liver cancer patients.

After locking in the ARTEMIS and ARTEMIS-DELFI models, the team validated them in an external cohort composed of 400 noncancer individuals at high and average risk of lung cancer and 88 known cancer samples, reporting similar performance.

Although Delfi hasn't yet made any moves from the commercial side to integrate ARTEMIS, Velculescu said that a combined assay could make sense because repeat elements and fragmentomic profiles can be measured from the same underlying whole-genome sequencing data. "Obviously this is a prototype," he said. "It's a way to sort of show the power of this … but, there's obviously an opportunity here to really validate this on a large scale, so we're very excited and interested to do that as we go forward."

"All these different features bring additional information that you normally wouldn't have, and in this case, I think it's even more clear because you're looking at portions of the genome which are the junk DNA, the dark matter," Velculescu said.

In addition, because of the paucity of tumor DNA that passes into the blood, liquid biopsy testing, especially for early cancer detection, can only benefit from the addition of more targets, added Annapragada.

"Even if [these features] reflect similar patterns, if we can add additional ones, you get more shots at the target," she said.