CHICAGO (GenomeWeb) – Working with genetic researchers in the US and Europe, scientists at Illumina have developed software that identifies repeat expansions in genomic data to assist in early detection of rare diseases such as Huntington's disease, amyotrophic lateral sclerosis, Friedrich's ataxia, and fragile X syndrome. Based on early results, the system, known as ExpansionHunter, has proven to be highly accurate.
In research presented at Cold Spring Harbor Laboratories' genome informatics conference this month, ExpansionHunter correctly identified expansions or potential expansions in all expanded genome samples of 212 ALS patients who had been tested for the mutation on the C9orf72 gene. The system also was 99.9 percent accurate in classifying 2,789 other samples as wild-type; it tagged just three as potential expansions, according to the poster, which CSHL published in its Genome Research journal this fall.
Illumina collaborated with a DNA research consortium called Project MinE, which provided samples on 3,001 patients with ALS. "They're trying to understand the genetic basis behind this disease, and read expansions is one of the important variants that causes it," said Egor Dolzhenko, a bioinformatics scientist at Illumina.
"On the surface, the changes might appear subtle," but the new generation of sequencers produces cleaner results than older versions. "We can see inside all the regions, including many regions that we could not see into before," said Dolzhenko, who joined Illumina in March 2016 to lead the ExpansionHunter effort.
"By combining new PCR-free sequencing together with accurate mathematics methods, we can basically start calling things that traditionally people thought were not possible for the data that's produced," Dolzhenko explained.
"Before, people thought that if you had [one of these diseases], that's it, it's kind of hopeless. Now, with CRISPR and other new technologies, these things could potentially be either cured or made much less severe," he continued.
San Diego-based Illumina developed the open-source ExpansionHunter to scan PCR-free, short-read whole-genome sequencing data to detect repeat expansions long before visible symptoms appear. ExpansionHunter identifies reads inside and flanking repeat expansions, then estimates the sizes of alleles for each repeat.
Illumina put the software out to the open-source community almost a year ago. Other developers have built extensions targeted to specific repeats, then shared their work with other ExpansionHunter users.
"ExpansionHunter could be used in the clinical context by people in clinical labs to call specific expansions that are known to be pathogenic," Dolzhenko said. An extension could help academicians find repeat expansions that could lead to discovery of new genetic links, he noted.
One extension that Dolzhenko said has not been published yet can perform genome-wide analysis without being told where to look for repeats. That obviously takes a lot of computing power, but scans of single samples can be run on a standard PC in a few hours, without the need for high-performance computing installations.
"We tried to make it as efficient as possible," Dolzhenko said. "That was an important use case for me." He noted that the Amazon Web Services cloud charges by the amount of computing power used, so a more efficient process saves researchers money on large cohorts of data.
Among the collaborators on ExpansionHunter research is the 100,000 Genomes Project's Rare Disease Programme in the UK, which is looking for various kinds of expansions in patients with fragile X symptoms consistent with gene expansions. "For those people, they actually did a deep dive where they applied this method and they found some families with a child that had an expansion that could explain the phenotype," Dolzhenko said.
In one "exemplary" case, according to the poster, the researchers noted a phenotype that fit the fragile X profile but clinicians did not have enough evidence to make a diagnosis.
"The phenotype is consistent with fragile X, but I guess it was not consistent enough for this person to be diagnosed before the whole-genome sequencing was done," Dolzhenko explained. "The genetic change is very simple," he explained while gesturing toward data on fragile X. "You have this type of repeat, which, in this case, is just two Gs followed by four Cs."
Dolzhenko said 100,000 Genomes has other interesting results related to this work that have not been made public yet, but should be in the near future.