SAN DIEGO (GenomeWeb) – At the Plant and Animal Genome conference here this week, the Broad Institute's David Jaffe presented a strategy for sequencing reference genomes from many different horses for an estimated price tag of around $25,000 apiece.
During an equine-focused session on Sunday, Jaffe argued that it should be possible to crank out affordable and highly contiguous horse genomes from PCR-free libraries sequenced on the Illumina HiSeq 2500 and assembled with the Broad's open source software tool "Discover Variants through Assembly" (DISCOVAR), which was released in 2013.
Complementary information could then be obtained from jump reads and long-range BioNano Genomics mapping, he added, theoretically opening the door to complete genome sequences from many individual steeds — and a look at loci and/or structural variations that are specific to just one or a few individual horses or horse breeds.
As in other organisms, horse resequencing studies generally involve mapping reads from multiple individuals back to an established reference genome. In the case of the horse, this reference genome was established using DNA from a Thoroughbred mare named "Twilight" at a cost of around $20 million.
But although researchers have successfully applied such resequencing strategies in understanding a wide range of plant and animal features, Jaffe noted that the approach likely misses some forms of genomic information that could be picked up by de novo sequencing and assembly on multiple individuals.
He noted that horse genome assemblies with long, accurate contigs (without scaffolds) could be obtained for around half of the proposed $25,000 reference genome price tag by doing de novo DISCOVAR assembly of paired-end 250 base pair reads generated from a lone, PCR-free library sequenced on one flow cell of the Illumina HiSeq instrument.
For example, Jaffe presented information on human and rhinoceros genomes that have been assembled with DISCOVAR, demonstrating the stretched out contig N50 lengths that can be achieved with the software relative to past assemblies produced using the ALLPATHS-LG approach.
To flesh out horse reference genomes by creating even longer contigs and scaffolds, meanwhile, Jaffe proposed adding in mate-pair or jump read data to fill in medium-range information in the genome, together with long-range clues provided by BioNano Genomics mapping.
He and his PAG abstract co-authors conceded that "[o]ptimal integration of these data types into a turnkey method will require substantial [research and development]." Nevertheless, they noted, "an initial demonstration for horses could be carried out this year."