SAN FRANCISCO (GenomeWeb) – Identifying structural variants from fixed tumor samples has long posed a challenge due to the fragmented nature of the DNA once it is fixed. But researchers from Stanford University and Dovetail Genomics believe they have come up with a solution, modifying the Hi-C protocol for chromosome conformation capture to work on formalin-fixed paraffin-embedded samples.
Helio Costa, an instructor in the departments of pathology and biomedical data science at Stanford, described the method during a presentation and follow-up interview at last week's Advances in Genome Biology and Technology meeting in Orlando, Florida. The team also posted a publication to the BioRxiv server last week that describes the method.
Dovetail is now offering the method, dubbed Fix-C, as a service and plans to develop it into a commercial kit by the end of the year. The Fix-C service costs $2,100 per sample and includes the Fix-C sample prep, sequencing on an Illumina platform, and structural variant analysis using Dovetail's Selva software.
Working with FFPE samples to analyze tumor genomes is challenging because the DNA is highly fragmented and often degraded. What's more, detecting mutations can be further complicated by tumor purity and heterogeneity. And detecting structural variants is even more challenging than point mutations because the DNA is fragmented.
Using a long-read sequencing technology on such specimens also does not help, since the fragments are so short. "The chances of capturing reads that straddle breakpoints are not very likely," Costa said.
Nonetheless, FFPE tissue is typically the specimen that is available for analyzing clinical samples.
The researchers turned to a method that has been used recently to capture long-range genomic information — Hi-C. The technique was originally developed in 2009 and combines proximity-based ligation with NGS to capture chromatin interactions. Since then, researchers have used it for a number of applications including haplotyping, identifying structural variants, and in metagenomics. In addition, companies like Dovetail and Phase Genomics now offer commercial services and kits.
Typically, the standard Hi-C protocol includes a fixation step to create crosslinks between histones and other proteins to capture the 3D organization of chromatin. However, FFPE samples are already fixed, so that step is not needed for the Fix-C protocol, explained Dovetail VP of Commercial Operations Veronica Mankinen.
Within a living cell, there are physical interactions between the chromatin proteins, Costa said. During the FFPE fixation process, those interactions are frozen in place. The Fix-C protocol aims to take advantage of that to identify structural variants.
First, the chromatin aggregates are extracted in such a way to retain the cross-linked DNA-histone complexes while also discarding naked DNA that is not bound to a histone. "These globules retain a lot of structural information," Costa said.
Next, the DNA is digested and the DNA fragments are biotinylated and then ligated together in such a way that the fragments that are in close physical proximity are ligated together. This proximity ligation step generates biotin-marked DNA fragments that are chimeras of two genomic regions that are in close physical proximity. Next, the crosslinks are removed and a sequencing library is created.
After sequencing, the reads are mapped back to a reference genome. Each read pair corresponds with a ligation event and ligation occurred between DNA fragments that were physically close to each other, Costa said. "So, we know these DNA molecules are adjacent in the genome," he said. "When we sequence and map back, we can infer whether there were structural variant events." For instance, he said, for a given read pair that has been ligated together, if that maps back to two different locations in the genome that can be indicative of a structural variant.
The analysis is done visually, by creating a histogram of two chromosomes and the frequencies of increasing distances spanned by reads in a pair. For chromosomes where there are no structural variants, the read pairs cluster close to the baseline. If there is a structural variant, that can be seen by an aberrant cluster further away.
The result are pixels that represent ligation events between reads. These pixels form a triangle shape and are very dense at the base of the triangle, representing ligation events between reads that are close to each other in the genome. A dense cluster of pixels further away from the base can indicate a structural variant, since it represents a ligation event between reads that are not close to each other in the genome, Costa explained.
To validate the method, Costa and his team applied it to 15 clinical adenocarcinoma and sarcoma FFPE samples that had previously been tested by FISH and/or RNA sequencing. FISH testing had previously identified fusions in 10 samples, of which Fix-C was concordant with 90 percent. In addition, Fix-C identified a ROS1 fusion from a sample previously called as negative by FISH, but confirmed as positive with RNA sequencing. The Fix-C method failed to detect one known fusion because the "fusion genes are in such close proximity that it was hard to detect the signal from the noise," Costa said. The Fix-C method also detected a confirmed ALK fusion in a sample where FISH probes failed.
The researchers next wanted to see how Fix-C performed on samples where structural variants were not known. Costa described one adenoid cystic carcinoma sample that the researchers analyzed via Fix-C and FISH. FISH detected one fusion event involving the MYB gene, "but what's really striking is that there were a lot of rearrangement events detected with Fix-C," he said. In the BioRxiv publication, the researchers elaborate on the findings, describing how the method identified complex rearrangements.
Finally, the team described how Fix-C can be used to analyze topologically associated domains (TADs). Chromosomes in cells organize into these so-called TADs, which have been shown to be involved in gene expression regulation. In addition, there has been some recent research showing that some gene rearrangements that impact TAD reorganization can also have implications for cancer. "Because this method preserves the 3D conformation of chromatin, we can resolve these topologically associated domains," Costa said.
Further research would be needed to validate the method for TAD detection as well as to better understand the clinical impact of TAD reorganization, he added.
Costa said his lab plans to keep evaluating the method on additional clinical samples, first by evaluating samples that contain known structural variants and validating that the method can identify those events, and then "working to do broad scale scanning to identify novel structural variants, particularly complex structural variants," he said.
The method itself is relatively straightforward to use, he said, with the main difference in the upstream portion of extracting the chromatin complexes. The main difference is in the informatics, which he said was "more nuanced" than RNA sequencing or FISH.
In his initial tests of the method, much of the analysis was done manually, visually looking at the histograms and determining whether or not a structural variant was indicated. But now, he said, Dovetail has developed a machine learning-based method to scan and look at pixel density.
In the future, he said it has the potential to be used as a clinical tool, "but there's a lot of work needed to validate it." Currently, he said, "it's a great exploratory research tool." In particular, he said, he is interested in using it to look for signatures that are indicative of patient response to therapy or prognosis.