Scientists are using next-generation sequencing to rapidly discover and characterize SNPs in animals, in turn enabling them to quickly generate tools for genotyping studies.
In a study of cattle, researchers led by the US Department of Agriculture recently performed reduced-representation library sequencing on Illumina’s Genome Analyzer to identify, and validate by genotyping, approximately 23,000 new SNPs and to compare predicted and actual allele frequencies.
The research, which appeared last month in Nature Methods, yielded a bovine genotyping array now sold by Illumina. Taking advantage of the low cost of the approach, researchers are now pursuing similar projects in other animals including swine, sheep, songbirds, fish, and water buffalo.
“This method is really time-saving and saves a lot of money,” said Larry Schook, a professor of comparative genomics at the University of Illinois at Urbana-Champaign and a director of the International Swine Sequencing Consortium, who was not involved in the cattle study. “We really get an opportunity to look at minor allele frequency and don’t have to spend a lot of time and money to validate populations.”
“I think in agriculture, it will have very rapid uptake,” said Curt Van Tassell, a research geneticist in the Bovine Functional Genomics Laboratory at the USDA’s Agricultural Research Service, and the lead author on the cattle project.
More than a year ago, Van Tassell and his colleagues, in collaboration with Illumina, set out to design a 60,000-bead bovine genotyping assay to help them predict what traits the animals would be able to pass on to their offspring.
Initially, they had planned to design the assay using publicly available SNP data from the bovine sequencing project and other studies. “But we got to the point very quickly of realizing that the distribution of SNPs available in the public domain was not suitable for what we believed to be an optimal design,” Van Tassell said.
They then approached several genome centers as possible partners for a Sanger sequencing-based SNP-discovery project, but “had trouble identifying a partner that would do sequencing for us at a price that would allow us to get enough SNPs to do what we needed to do,” Van Tassell said. After briefly considering 454, he got in touch — almost by chance — with what was then Solexa.
At the time, Van Tassell’s colleague Tad Sonstegard was collaborating with Solexa on a transcript profiling project, and the teams agreed to replace some sequencing runs from that project with runs for the SNP-discovery project. Solexa scientists also “thought the strategy had some merit,” Van Tassell said, “although it took some serious convincing for them to try it.”
While Solexa provided the sequence data, Van Tassell and his team processed the data and designed the SNP BeadChip, having “full control over the final assay.”
“It was actually that coincidence that we had someone positioned in the sequencing queue in-house at Solexa on a project that allowed us to use that strategy,” he said.
The scientists sequenced three pools of DNA from a total of 66 cattle in five sequencing runs. The genome content of each was reduced by using a restriction enzyme and selecting fragments of a certain size for sequencing. That strategy “seems to do exactly what we hope to do — that is, have essentially a random distribution of fragments from throughout the genome,” Van Tassell said, with only few “hot spots” and “cold spots.”
They then aligned the sequencing data to the cow genome and identified over putative 62,000 SNPs while predicting their allele frequencies at the same time. Based on genotyping data, the false-positive rate was about 8 percent, and the allele frequency measured by genotyping and sequencing correlated at approximately 70 percent, or with “relatively high confidence,” according to Van Tassell.
Based on the sequence reads, which Van Tassell and his team received at the end of December 2006 and in late spring of 2007, they designed a genotyping assay, which they sent to Illumina in June.
“This method is really time-saving and saves a lot of money.”
The company provided them with their first genotyping chips by the end of September 2007. Illumina now offers the assay as the BovineSNP50 BeadChip, which has more than 54,000 probes that target SNPs, almost half of which, or 24,000, were discovered by the Genome Analyzer.
“The Solexa platform seems to be extremely well suited for this strategy,” Van Tassell said. “And there doesn’t seem to be any downside to the short reads. In fact, the short reads meant that almost all the SNPs had unique locations on the genome rather than being co-located on the same read.”
But the main advantage of the approach is its low cost, Van Tassell said. In the Nature Methods article, the scientists estimate that the reagent cost per SNP in their project was approximately $0.48, compared to about $2.95 in an unrelated project that used Sanger sequencing.
With recent improvements in library construction and sequencing performance, Van Tassell estimated the project today would cost between 10 cents and 25 cents per SNP.
Secondly, because of the depth of coverage the next-gen sequencer generates, researchers are able to make allele frequency estimates, which Sanger sequencing does not offer. “That’s the real clever part, that you can get some estimates of minor allele frequencies from this sequence,” Schook said.
Van Tassell said he believes the method could be applied to a variety of species, as long as some genome draft is available against which to align the reads. Even less than a draft might be sufficient, he said.
“One could do a shotgun sequence of the desired organisms and either use shotgun reads or a poor man’s assembly of those shotgun reads to align [Illumina reads] to that, without having a full-blown assembly,” he said.
Alternatively, researchers could use 454 sequencing to generate an assembly against which to align Illumina’s reads, he added, an approach currently pursued by a sheep SNP discovery project. “That is indeed one reason why one might consider 454, if you have a species without an assembly, or with a very inadequate assembly,” he said.
With the right choice of restriction enzymes and sequencing depths, he said, the approach could even identify low-frequency polymorphisms in humans or model organisms — a goal of the 1,000 Genomes Project (see In Sequence 1/22/2008).
Van Tassell, whose USDA group recently acquired its own Illumina Genome Analyzer, plans to use the same approach to discover SNPs in several other projects. This includes a National Academy of Sciences-funded collaboration with researchers in Pakistan that will use the cow genome as a reference to build a SNP panel for the water buffalo, the primary milk source in the country,
to help perform genome-enabled selection.
“We envision to be able to roll that out very cost-effectively compared to the traditional strategies of doing QTL mapping and discovery and then marker-assisted selection,” Van Tassell said.
The approach is catching on with others. After a pilot project in turkey and pigs last year, Martien Groenen, a professor in the animal breeding and genomics center at Wageningen University in the Netherlands, in collaboration with a Dutch ecological research institute, is now embarking on a project to create a high-density SNP assay of the great tit, a bird common in Europe and Asia, by generating sequencing data on an Illumina Genome Analyzer with collaborators at neighboring Leiden University.
Because the genome of the great tit is not sequenced, Groenen said he originally planned to generate both 454 and Illumina sequencing data, using the long 454 reads as a reference to align the short Illumina reads to. “I [planned to] use the Solexa for the identification of the SNPs and the 454 to have longer reads for being able to design the assay, for primers on either side of the SNP,” he said.
However, following a pilot study in turkey last year, he said that he thinks he can get away with only Illumina sequence reads now by starting out with 2- to 3-kilobase fragments that are randomly fragmented and sequenced, and subsequently building them into short contigs.
“So we think that even if you don’t have a reference genome, by doing it like this, building your own small reference fragments for aligning the Solexa sequences, will work,” Groenen said.
The reason for abandoning 454 is cost: “For the same amount of money, you get at least 20- or 30-fold less sequence out of it,” he said. “That’s a really big difference.”
Having 454 reads would make the project easier, “but because it’s more costly, and also, preparing the samples is a little bit more tricky and laborious, we said, with the budget that we have, let’s first try just only Solexa sequencing.”
Groenen is planning similar projects in tilapia as well as in ducks. “I think many groups will go in this direction once they see this kind of methodology evolve, and see that with limited budgets, they still can get quite a number of SNPs identified, and then use those in all sorts of genetics studies,” he said.
Like Van Tassell, Groenen is also involved in the International Swine Genome Sequencing Consortium, which recently obtained funding from the USDA to perform a SNP-discovery project that plans to crate a 60,000-SNP chip using a similar approach to Van Tassell’s cattle project.
“Traditionally, we have been struggling with how much more sequencing we would have to do by Sanger, and then the cost of validation,” said Schook, the consortium’s director. “But this approach allows us to see, in real time, what SNPs would be useful.”
The swine SNP project will use a combination of 454 and Solexa sequencing, and the Wellcome Trust Sanger Institute will generate the sequence data. While 454 will provide “the longer flanking sequences,” the Illumina data “gives us some deeper sequencing, so we can really look at allele frequencies,” he said.
Besides cost, the approach is fast, Schook said, estimating that the project will take about six months. “The timeframe of going from discovery to developing an assay has really been reduced.”