When studying the causes of human disease, researchers find themselves turning again and again to mutations people carry within their genetic makeup. Determining an individual's genotype could be the difference between finding the genetic cause for, say, diabetes, and missing it entirely. But identifying SNPs and calling genotypes has always been time-consuming, riddled with false positives, and — because of technological limitations until now, — researchers have only been able to determine parts of an individual's genotype using next-gen sequencing and only some DNA sequence variants in certain genomic intervals using deep sequencing.
Researchers at the Scripps Research Institute have come up with a novel method utilizing population sequence data to identify SNPs and assign genotypes to individuals. They call it SNIP-seq, for single nucleotide polymorphism identification from population sequence data. Vikas Bansal, first author on the team's paper in Genome Research, says the method is highly accurate and reduces the rate of false positives that are sometimes caused by sequencing errors. "The motivation behind the study was that you have a lot of tools for aligning the short reads generated by the next-generation sequencing platforms to a reference genome and you also have tools for identifying SNPs, but when you have population sequence data, you can leverage the fact that you have multiple individuals' sequences across the same genomic regions to improve both the accuracy of SNPs and the genotype calling," Bansal says.
To evaluate the accuracy of their method, the researchers used sequence data from a 200-kilobase region on human chromosome 9p21 — where genes that add to a person's risk of developing coronary artery disease and diabetes are located — from 48 people, generated using the Illumina Genome Analyzer platform. They found that the SNIP-seq method is accurate for detecting variants and can filter out false SNPs. In addition, the researchers found novel SNPs in this region of the chromosome, which they later validated by using pooled sequencing data and confirmed using Sanger sequencing. The team also estimates that SNIP-seq achieves a false--positive rate of about 2 percent, an improvement on previous methods, which Bansal says in his experience are about 5 percent.
By going back and using this new, more accurate method to re-sequence genomic regions associated with disease, it may be possible to better detect rare variants that contribute to disease progression. Errors caused by less accurate sequencing methods carry the danger of false SNPs in a few individual sequences being classified as real, which could hamper the accuracy of disease research as investigators waste time on a wild goose chase. The Scripps team adds that by using population sequence information, as the SNIP-seq method does, scientists can potentially distinguish false SNPs from real ones.
"This method will allow researchers who are doing population sequencing studies to get to the point where [they] have a set of variants that are accurate and real, and also to make sure [they] have the correct genotype," Bansal says. The researchers say the results show that the utilization of population sequencing data improves the success rate for the detection of SNPs and the assignment of genotypes to individual samples. This breakthrough could lead to more accurate study of disease-causing genetic variants and viable targets for new drugs.