The jury is still out as to whether demand for genotyping will spur the creation of high-throughput centers built upon the sequencing model, but Daniel Koboldt is preparing for a potential onslaught. A bioinformatics developer at the SNP Research Facility at Washington University School of Medicine in St. Louis, Koboldt envisions a day when academic genotyping centers will be as essential as the “G5” sequencing centers, and he’s already developed the laboratory information management system to support the endeavor.
Following on its major role in the human genome project, Wash U is responsible for finding the most useful SNPs in chromosome 7p for the International HapMap project. Collaborating with a group led by Pui-Yan Kwok at the University of California, San Francisco, the Wash U researchers must process massive amounts of raw SNP data provided by the HapMap data control center at Cold Spring Harbor Laboratory to find those SNPs most likely to be informative. The team then tests these SNPs in high-throughput assays with the goal of narrowing the set down to 11,500, which will provide the initial target coverage of one SNP per 5 kb for the region.
Many of the HapMap collaborators are using genotyping systems from Third Wave or Illumina, but the UCSF/Wash U team has taken a slightly different tack. UCSF’s Kwok developed a method known as template-directed dye terminator incorporation assay with fluorescent polarization detection (FP-TDI), in which a SNP-specific primer tagged with fluorescent dye anneals upstream of the polymorphic site in the target DNA to indicate which allele is present. The method enables high-throughput SNP genotyping in a 384-well plate format, and can generate about 10,000 genotypes per day, according to the Wash U team.
PerkinElmer is commercializing the technology, which it licensed from Kwok, and the company provided the system that the Wash U group is using, except for one component: a LIMS. While PerkinElmer does offer several LIMS software products, as well as software that operates the FP-TDI machinery, “their system is designed for qualitative analysis rather than quantitative,” Koboldt said. “So if you were interested in one SNP ... what comes with the machine would be fine. But with the HapMap project, what they give us is not sufficient for a high-throughput system, and that’s where our LIMS came into play.”
The LIMS manages the SNP data both before and after the assay plates are fed through the fluorescence reader. A set of scripts reads through XML files provided by the data control center to search for any changes in the records of each SNP and then update Wash U’s local SNP database accordingly. The software then helps select which SNPs to assay based on their quality classifications and helps design the assays and select the PCR and TDI primers for each SNP. After the plates are run, PerkinElmer’s SNPscore software performs the SNP calling, but even this process has been modified via customized interfaces that guide the researchers in assessing the quality of the assay. Koboldt wrote additional Perl scripts to check the data for errors, such as Mendelian violations, and to calculate allele frequencies. In a final step, the system formats the genotype data into a specified HapMap-approved XML schema, which it submits back to the data control center via FTP.
Koboldt estimated that the lab is currently about 90 percent of the way through the first set of HapMap data, “but I don’t think we’d be anywhere near that if we didn’t have a system that put it all together like this.”
The current system is probably “sufficient” to complete work on HapMap, Koboldt said, but he’s continuing to develop it with the goal of making it available for a broader user base. PerkinElmer has expressed interest in offering the LIMS as part of its FP-TDI offering, and Koboldt said he’s also made the system available to other HapMap participants using FP-TDI, such as the Beijing Genomics Institute, and would be willing to share his work with other interested research groups.
In addition, Koboldt said, “What we want to do next is expand [the system] so that we can become a major genotyping center, much like the genome sequencing center at Wash U.” By planning the LIMS component ahead of time, Koboldt said he hopes to circumvent some of the difficulties that the sequencing group experienced: “Once they did the human genome, they went on to take on a bunch of custom sequencing projects, and they had to continually build their information systems to handle that,” he said.
Steps toward Koboldt’s pre-emptive LIMS development goal include a project-request center on the lab’s website (http://snp.wustl.edu/index.html) so that customers can submit projects and track their progress — a feature that is already paying off as researchers stumble upon the site, Koboldt said. The group even had a query from an Indian biotech company that was interested in outsourcing its genotyping to the lab — a request that Koboldt said the group found surprising in light of the current trend for outsourcing work to flow in the opposite direction.