NEW YORK (GenomeWeb News) – A group of researchers from Affymetrix, Genentech, and the Stanford Genome Technology Center have developed a protocol that improves upon high-throughput array-based resequencing methods. The work is scheduled to appear online this week in the Proceedings of the National Academy of Sciences.
The team's approach relies on three steps: amplifying target sequences from genomic DNA, allele enrichment to distinguish between pools of variant and non-variant DNA, and array-based resequencing. When they applied the approach to HapMap samples — resequencing exonic sequences for about 1,500 genes in nearly 500 samples — the researchers identified almost 30,000 previously unidentified variants.
Senior author Malek Faham, formerly of Affymetrix and now with a start-up company called MLC Dx focused on molecular diagnostics, told GenomeWeb Daily News that the resequencing approach solves two key problems researchers have faced when trying to do high-throughput, array-based resequencing in the past: sample preparation and quality.
He said the team's multiplex amplicon method lets researchers scale up their sample preparation compared to older sample preparation techniques, while the allele enrichment step increases the quality of the data and decreases the false positive rate.
Although arrays have helped researchers evaluate common variants in hundreds or thousands of individuals in a single study, the common variants found so far often explain relatively little disease risk, Faham and his co-authors noted.
That suggests the yet-undetected, disease-related genetic variation may lie in rare variants, copy number variants, or multiple common variants that confer even lower risk than those found so far. Consequently, some researchers are moving towards resequencing approaches in an effort to identify rare genetic variants associated with disease.
But, Faham said, so far high-throughput, array-based resequencing approaches have lagged behind other array technology, particularly in terms of sample preparation and quality control. In an effort to improve and streamline this process, Faham and his colleagues developed what they call a "fully multiplexed high-throughput pipeline that results in high-quality data."
"This technology enables the generation of high-quality sequencing data over many megabases in thousands of samples," the authors explained. "[T]hese types of data are necessary for resequencing-based association studies."
First, they picked out the loci of interest from genomic DNA using a technique called target amplification by capture and ligation, or TACL. Although probe construction for this step can be time consuming, Faham said, probes can be used over and over, to run hundreds or thousands of samples.
Next, the researchers used an allele enrichment step called mismatch repair detection, which relies on bacterial mismatch repair machinery. For this step, the team transformed DNA into bacteria that are genetically engineered to recognize DNA mismatches. Depending on which media the bugs can grow in following transformation, the researchers could then distinguish between bacteria carrying variant and non-variant DNA sequences.
Finally, the variant and non-variant DNA pools are subjected to array-based resequencing. Faham explained that he and his team developed custom Affymetrix microarrays that corresponded to the probe sequences in the target amplification step.
The researchers tested this resequencing pipeline, looking for rare variants in the exonic sequences of roughly 1,500 genes in each of 473 DNA samples obtained from the international HapMap project.
Indeed, their approach yielded many known SNPs as well as 29,519 new, unique variants. Based on the results of this pilot study, the team reported that their resequencing approach has a false positive rate of about one in every half a million base pairs and a false negative rate of roughly ten percent.
The researchers did not disclose the location of the newly-identified variants. Faham said that data, along with resequencing results generated for several types of tumor samples will be published in another study.
In addition to demonstrating that the approach can identify rare genetic variants, the team outlined specific avenues for improving — and further scaling up — the technology down the road. For instance, Faham noted that proof-of-principle experiments suggest it should be possible to make a pool of probes for target amplification en masse.
The researchers also noted that the modularity of target amplification and MRD suggest it should be possible to use these approaches in conjunction with high-throughput sequencing platforms rather than arrays down the road. When the researchers finished this work at the end of 2007, Faham explained, it was much more cost-effective to use arrays than to use second-generation sequencing. But with the cost of sequencing going down that may change in the near future.
And because the approach can distinguish between variant and non-variant DNA, Faham noted, it may be possible to decrease the cost of sequencing by an order of magnitude simply by focusing on the variant DNA pool alone rather than resequencing the entire genome.
"Genotyping arrays have enabled large association studies through genotyping tens of thousands of samples," Faham and his colleagues wrote. "By creating appropriate 'upfront' processes for resequencing arrays we have created the potential to conduct similar large-scale resequencing-based association studies."