Researchers at Affymetrix, Genentech, and Stanford University's Genome Technology Center have created a new array-based resequencing assay that they claim is scalable enough to compete with second-generation sequencing technologies.
A paper published in the early edition of the Proceedings of the National Academy of Sciences provides a proof of principle for the method, which is performed on the Affymetrix GeneChip platform.
The paper describes a "fully multiplexed high-throughput pipeline that results in high-quality data." The method provides target amplification from genomic DNA followed by allele enrichment to generate pools of purified variant or nonvariant DNA, and the interrogation of purified DNA on resequencing arrays.
According to the paper, the authors used this pipeline to resequence approximately 5 megabases of DNA on three arrays corresponding to the exons of 1,500 genes in 473 samples. The study, which sequenced a total of 2,350 megabases, returned a false positive rate of 1 in 500,000 base pairs and a false negative rate of around 10 percent.
To be sure, Affy has in the past sold arrays for resequencing. However, the paper states that there have always been two main challenges to creating a resequencing assay that can meet users' demands: It must have scalable sample preparation and yield accurate results. According to co-author Malek Faham, the new method resolves both issues.
In their paper, the co-authors used custom-designed resequencing arrays that Faham said use "half the real estate" of existing resequencing arrays. Each chip in the three-chip set used in the assay contains 1.65 million perfect-match probes that match the reference sequence of the human genome. Each reference base in the genome has a PM probe that lays in the middle position, in this case the 13th position as the probes are 25 base pairs long.
According to the paper, each PM probe had three matching mismatch probes in which the middle base was replaced with each of the three other bases. Because the probes were tiled along the genome, each position in the genome was represented in 25 different PM probes. The probes were complementary to only one of the strands, and the strand switched at every adjacent position.
"We previously found that correlation of signals from probes in different strands is lower than for neighboring probes on the same strand, and therefore the switching provides more information than solely with one strand while requiring only half as many probes as standard resequencing arrays that use both strands at each position," the authors wrote.
To obtain more information in regions near known SNPs, the authors also made extra tiling probes in those regions. For each PM probe, if there was a known SNP in dbSNP within the 25-mer sequence, then there was one additional probe for each allele of the SNP. The probe used was identical to the PM except that at the SNP position, the base matched the nonreference allele of the SNP allele, the paper states.
According to Faham, the new approach could facilitate a new round of genome-wide association studies that could give researchers the opportunity to look at rare alleles, rather than the common alleles that are found on most arrays sold for use in GWASs today.
"The basic genotyping platforms look at common alleles and they are good at testing common allele/common disease hypotheses," Faham told BioArray News this week. "But they have much less power to detect rare alleles that are associated with disease."
Faham, a co-founder of ParAllele Bioscience and former Affy scientist now working in diagnostics startup MLC Dx, said he believes that large-scale genotyping studies empowered by the new assay are necessary to survey rare alleles for links to diseases.
"There is a basic need to sequence a lot of people, and we are proposing [that] this technology can be scaled up to do that," Faham said. "We know we can do a lot of samples on arrays. It's been done with high-throughput genotyping, so why not do it with resequencing?"
[ pagebreak ]
TACL and MRD
The unnamed method, was designed to overcome the need for scalable sample preparation and accurate results, two main obstacles to large-scale resequencing.
According to the paper, the solution was to use target amplification by capture and ligation (TACL) to amplify the specific loci of interest, followed by an allele-enrichment step with mismatch repair detection (MRD) in which amplicons carrying variant and nonvariant alleles are separated into almost pure homozygous states.
According to the paper, the method then sequences the enriched alleles on the array with an algorithm that makes sequencing calls. All sample-handling steps other than the array- handling steps are performed in a 96-well plate.
"TACL allows us to hunt down the regions of the genome we want," said Faham, who likened the method's scalability to multiplex PCR. The MRD step, which purifies the alleles, enables more accurate results, he said.
"We detect mismatches, sense them, take the mixture, and split the culture into two tubes," Faham said. "In one tube, you will have variant fragments; in the other non-variant fragments. You’ll be looking at pure populations, which is easier for arrays to distinguish, and you will get a much better performance."
Faham said that the fact that sample prep is handled in a 96-well format could eventually make the method attractive for users. Though Affy has debuted a 96-well format array platform called GeneTitan, and plans to make genotyping assays available for use on the platform later this year, Faham said the work detailed in the PNAS paper was completed before GeneTitan was developed has not yet been designed to work on the new platform
Modular Future
While Faham said he hopes the assay will eventually be adopted for large genome-wide resequencing projects, it is unclear what Affy's intentions are for the assay. The paper itself states that while the work was done at Affy, there are "no current products or specific plans to make products of work described in this manuscript."
An Affy spokesperson told BioArray News in an e-mail this week that the Santa Clara, Calif.-based array vendor is "not disclosing any plans to commercialize this assay," though customers "can look forward to Affymetrix deploying products that incorporate advanced enzymatic technologies in the future.”
Faham declined to comment on any plans to commercialize the new approach.
Affymetrix has discussed in the past plans to make a high-throughput resequencing assay available as a product. Last year former CEO Stephen Fodor said that the firm had plans to introduce a sequencing-on-chip product that would perform surface enzymatic labeling reactions directly on its arrays (see BAN 1/15/2008).
Fodor said at the time that the firm had developed "new chemistry" that enabled it to use both polymerase- and ligase-labeling reactions directly on an array surface. "We can target all or any particular region of the genome, and so then targeting genotyping or all the way to the whole genome will be available with this technology,” he had said.
Faham said the assay is ideal for genome-wide resequencing studies rather than as a follow up technology for genome-wide association studies, though it can be used in that context too.
"My personal vision is to do large studies in samples to pick up what genotyping missed," Faham said. "If we need to run genome-wide studies, that is ultimately what we need. The question is what technology can take you there. You need to do high-throughput sequencing of samples, that is for sure."
Faham also said the method could not only be seen as a rival to second-generation sequencing platforms, but could also be used to complement projects that use those instruments.
"The way I see it is that this technology is made in modules," Faham said. "There is TACL, MRD, and detection. One can imagine that you can mix and match this module," he said. "You could take a piece of the pipeline, like TACL or MRD, and use it with the next-gen tools. All next-gen companies are looking for target-preparation solutions and this is part of that puzzle."