NEW HAVEN, Conn.--Bioin-formatics played a critical part in enabling CuraGen to discover 55,000 more coding SNPs, nearly doubling its cSNP database to 115,000--25 percent of the estimated total cSNPs in the human genome.
Richard Shimkets, director of internal discovery at CuraGen, told BioInform that the company has concentrated on cSNPs, which represent about 5 percent of the human genome, because "changing the function is really what we're most interested in, whether that results in predisposition to disease or a change in the way one responds to a therapeutic drug." The SNPs found in the remaining 95 percent or noncoding area of the human genome are less likely to influence the function of proteins and thus have less value in personalized medicine and disease association studies, he added.
Bioinformatics is integral to cSNP discovery because of all the data that have to be sorted in the effort. The company's analysis involved over 4 million human sequences, each of which contained as many as 400 bases, resulting in over a billion pieces of information. In addition, each nucleotide in every sequence has a quality score associated with it telling how likely it is to be correct. "This is not something that could have been done without a lot of bioinformatics work," said Shimkets. "Without our dedicated team of bioinformaticists, we would never have been able to make this kind of a discovery."
Once the analysis is done, bioinformatics is then applied to feed the information to biologists "in a way that makes sense," he noted. CuraGen's discovery team has been working closely with its bioinformatics group to build interfaces using the company's GeneScape platform to allow a scientist to "drill down into a gene" and get information about the kinds of variations that are in CuraGen's database. Then the researcher can plan an experiment around those variations in a way that might be "biologically meaningful," he added.
CuraGen's findings were achieved by using its SeqCalling sequence normalization technology along with a high-throughput facility and staff who worked multiple shifts, commented Shimkets. "That's really how we've been able to obtain this result so quickly," he observed. The company, which began the SeqCalling process on human tissues a little over a year ago, has generated over 2.5 million new sequences. Various estimates have said that there are between 250,000 and 400,000 cSNPs, although there could be many more than that, Shimkets remarked. "Based on some of our work, I think it is possible that there could be as many as 600,000 cSNPs."