To enhance its crop breeding and other research programs, Texas AgriLife Research, an agricultural and life sciences research agency that is part of the Texas A&M University System, recently installed Illumina's HiSeq 2000 and is currently waiting to receive the 2500 upgrade.
Illumina said previously that the 2500 will be available during the second half of this year, and that it expects at least 60 percent of new HiSeq orders to be for the 2500.
According to Charles Johnson, director of genomics and bioinformatics at AgriLife Research, the HiSeq 2500 will allow his facility to be more flexible, because it can either run the instrument in 'deep mode' for projects that require high sequencing depth, or in 'fast mode' for studies that do not need as much depth and could benefit from faster turnaround. "So in a very real sense, with the purchase of the 2500 we get a HiSeq 2000 and the equivalent of two MiSeqs," he said, with the caveat that the instrument cannot be run in both modes simultaneously. "Having the flexibility the HiSeq 2500 provides is key for us, given the array of groups we collaborate with."
AgriLife Research has a $200 million annual operating budget and employs about 550 PhD-level scientists. It is very active in agricultural research, both at its headquarters in College Station and at its 13 research centers across Texas that conduct breeding programs for sorghum, corn, cotton, vegetables, and specialty crops. In addition, it has research programs in many other areas, such as infectious disease and cancer.
About four years ago, the organization started to build out its genomics and bioinformatics capabilities, allowing it to apply genetic marker-assisted selection to accelerate its plant breeding programs, for example to select for traits like drought tolerance.
In terms of sequencing, AgriLife Research's facility is currently equipped with the newly arrived HiSeq 2000, two GAIIx machines, and a Roche/454 GS FLX. Until now, the GAIIx machines were used "for almost everything," Johnson said, and the 454 was mainly applied to de novo genome assembly and metagenomic studies.
"I imagine that for the most part, we will be migrating everything to the HiSeq simply because the cost of the reagents [per base] is so much cheaper," he said, while the GAII will "mostly be used to deal with overflow."
According to Bill McCutchen, executive associate director of AgriLife Research, identifying markers associated with traits of interest allows breeders to exclude up to 80 percent of the plant material they would have used otherwise. "They don't have to do a lot of the excess breeding they did traditionally because they didn't know if that trait was present or not," he explained.
Genotyping-by-sequencing allows the researchers to quickly identify and analyze large numbers of genetic markers that are distributed across a plant genome, using a technique called restriction-site associated DNA sequencing, or RAD-seq.
In RAD-seq, restriction enzymes cut the genome at certain sites, and only areas around those sites are sequenced at high coverage, leading to a reduced representation of the genome. These so-called RAD tags can be used to identify SNPs and to map traits in the plants.
The method is especially useful in agriculture because it does not require a reference sequence and because it is inexpensive. For many crop species, there is no reference genome available, Johnson explained, so targeted sequencing approaches such as exome sequencing don't work. "It's a huge advantage in plant breeding, where our wheat breeders [for example] will typically have 150,000 lines," he said, though not all of these will be genotyped by RAD-seq.
In addition, many crop species have "incredibly complicated genomes" that are often polyploid, and traditional SNP assays "are really not amenable to those species," he said.
RAD-seq is also superior to microarrays, which are used rarely at the facility. Arrays are "definitely cost-effective if you have prior knowledge about the SNPs," Johnson said, "but actually, we have found using genotyping-by-sequencing is cheaper."
For a typical genotyping-by-sequencing experiment, he and his colleagues generate 100-base paired-end reads, using the read near the restriction site for SNP calling and the other one primarily to aid with alignment. For less complex or well annotated genomes, 50-base or 100-base single-end reads are sufficient.
The coverage needed depends very much on the species, he said, ranging from "pretty modest" for low-complexity diploid species to high for complex polyploid crops such as wheat, sugarcane, or cotton.
Johnson said that with the HiSeq, he expects the largest application of sequencing in his facility to be RAD-seq, including projects from users for whom RAD-seq was previously too expensive.
But the facility also serves a variety of other users, both within and outside AgriLife Research, who have different needs. For example, Johnson and his colleagues do a lot of bacterial and viral sequencing projects for the Texas Veterinary Medical Diagnostic Laboratory, which do not require the same amount of sequence depth and could benefit from the high speed of the HiSeq 2500 in fast mode, he said.
The group also recently sequenced the genome of the Quarter Horse, the first horse genome analyzed by next-generation sequencing, and it participated in a tuberculosis project that involved sequencing more than 800 Mycobacterium tuberculosis genomes.
Many challenges encountered at the facility arise from this breadth of projects and collaborators. "Nearly every project has some aspects that require us to develop new methods or bioinformatics analysis tools," Johnson said, "but it makes for a very challenging and exciting environment to work in."
"Of course we have the same challenges that all NGS faculties have; we could always use more computer storage and bioinformatics staff," he said. "In general the greatest challenge for any new disruptive technology like NGS is understanding the nature and properties of the measurements and then coming up with the hypotheses and experimental designs that will fully utilize the technology, and of course having the computational, bioinformatic, and statistical tools to address those questions."