By Julia Karow
As Illumina sequencers in particular, as well as the 454 platform, have become the workhorses for a variety of sequencing projects at the Broad Institute, its scientists have fine-tuned their sample-prep methods to keep the machines fed at all times.
At the Evolution of Next-Generation Sequencing conference in Providence last month, two members of the Broad's process development team shared some of the changes that have made sample prep faster, cheaper, and more high-throughput.
As of late September, the Broad had 43 Illumina Genome Analyzers and 31 HiSeq 2000 instruments in production, as well as two 454 GS FLX — down from 10 GS FLX in 2008. By the end of the year, the institute expects to have a total of 51 HiSeqs installed, in exchange for fewer GAs.
In addition, the Broad has three Applied Biosystems SOLiD instruments — down from eight last year —, one Helicos Genetic Analysis system, and one Polonator, none of which it uses in sequence production. It has also deployed early-access instruments from Pacific Biosciences and Life Tech's Ion Torrent, and it still maintains four legacy Applied Biosystems 3730xl capillary sequencers.
According to Niall Lennon, assistant director of process development at the Broad, Illumina sequencing, due to its low cost and high yield, has become the institute's "technology of choice" for human whole-genome and targeted resequencing, including exome and transcriptome sequencing; ChIP-seq; DNA methylation sequencing; studies of genome structure; as well as whole-genome sequencing of non-human organisms ranging from microbes to mammals.
The challenge, he said, is that sample-prep methods and read-coverage requirements differ for many of these applications. Also, because of the high throughput of the HiSeq — currently about 250 gigabases per run — increasing numbers of samples need to be prepared for each run.
For example, Lennon said, the institute is currently funded to do several exome sequencing projects comprising a total of 24,000 human exomes. It is using Agilent's SureSelect to capture exonic DNA, a method it co-developed with Agilent.
Right now, each exome requires two or three lanes of sequencing on the GAIIx, he said, but the hope is to be able to run one or two exomes per lane on the HiSeq.
The projects include, for example, a 6,000-exome genome-wide association study follow-up for inflammatory bowel syndrome; a 5,200-exome GWAS follow-up to study breast cancer risk; a 3,200-exome study of early-onset myocardial infarction; a 2,600-exome type II diabetes study; a 1,600-exome study of cardiovascular disease; a 1,000-exome autism study; and a Cancer Genome Atlas study on more than 800 exomes for ovarian, glioblastoma, lung, renal, leukemia, adenocarcinoma, and gastric cancer.
Meantime, the 454 GS FLX, owing to its long reads and short run time, currently still has a place at the Broad for the de novo assembly of small genomes, such as viral, microbial, and fungal genomes; for metagenomic projects, such as 16S profiling or virome profiling; and for some targeted sequencing, such as HLA typing, gap-filling projects, and validation of cancer mutations. However, Lennon said that the institute plans to assess other, "third-generation" sequencing platforms for some of these applications in the future.
Because each sample sequenced on the 454 only requires only modest coverage, large numbers of samples need to be prepared for each run. Samples per project run in the thousands: for example, in one study, Broad researchers are sequencing the genomes of more than 10,000 viruses; in a human microbiome project, they are profiling 16S ribosomal DNA in more than 2,000 samples; and in another study, they are HLA-typing more than 3,000 HIV-infected patients.
Standard protocols are ill-equipped to process these staggering numbers for either platform. For example, a technician using these methods can only prepare six 454 sequencing libraries in two days, or 12 Illumina libraries per week, or around 12 DNA captures per week. Also, sample multiplexing is limited with standard protocols, it is easy to confuse or cross-contaminate samples, and the cost and DNA input requirement per sample is high.
[ pagebreak ]
Over the last several years, the Broad has therefore made several changes, starting with the 454 and now also for the Illumina, to increase the throughput and consistency of sample prep, while decreasing the cost.
Overall, the institute has been using process optimization programs developed in other industries, such as Six Sigma and 5S, to help it standardize workflows and create a culture where every operator feels responsible for making improvements and recognizing problems, Lennon said.
Samples are tracked by barcoding all tubes and plates and by using a laboratory information management system to record all sample transfers.
One "key improvement" that Lennon cited for increasing throughput and decreasing variability and cost has been to standardize on a single automated liquid-handling platform — the Agilent Bravo — which the institute now uses for all transfer steps, reaction clean-ups, and size selections. This not only allowed it to get a discount on the equipment — it currently has 12 Bravos — but also to train many people to program the instrument, and to exchange spare parts between machines. In a next step, the Broad is adding plate stackers, he said.
The first step in preparing DNA for sequencing is to shear it to the correct size. In order to increase throughput and decrease sample loss, the Broad has moved to using 96-well plates for acoustic shearing on the Covaris system. According to Kristen Connolly, a senior process development associate at the institute, the size distribution of the output DNA is "much tighter than with other shearing methods." To check the quality of the DNA, the Broad has also recently switched from the Agilent Bioanalyzer to the Caliper GX, which allows for plate-based QC.
In order to automate clean-up and DNA size-selection steps — processes that require spin columns and gels in the original protocols, and are therefore hard to automate — the Broad has moved them onto paramagnetic solid phase reversible immobilization, or SPRI, beads from Beckman Coulter Genomics. The advantage of the beads is that they are compatible with liquid-handling systems and can be applied to all six clean-up steps in the original Illumina protocol, Connolly said.
However, she noted that the SPRI beads can only select DNA in the 100- to 200-base pair range, so institute researchers still run size-selection gels for other fragment lengths, or if they need a tighter size distribution.
For that purpose, they have implemented Sage Science's Pippin Prep, a gel-based automated DNA size-selection and collection instrument. The Broad currently has eight Pippin Prep instruments, allowing it to do almost 500 size selections per week.
In order to reduce the loss of GC-rich DNA during the PCR step of the Illumina sample-prep process, the institute has also altered PCR conditions by using additives, a special polymerase, and optimizing cycling, she said.
Multiplexing samples was especially important in order to keep the 454 platform competitive "despite its high run cost," Lennon said, so the Broad introduced molecular barcodes that can be added "at various stages of the process."
With the increased throughput of the Illumina HiSeq, barcoding or indexing is also becoming more important for that platform, he added. Barcoding also helps to detect contamination, Connolly said. For the Illumina, the institute has devised 8-base barcodes that can be introduced early in the process.
Due to all of these changes, a single technician is now capable of generating 96 454 libraries in two days, and of performing almost 800 DNA captures per week, or about 3,200 per month, a number that is "going to continue to rise," Lennon said.
[ pagebreak ]
According to Connolly, the institute has scaled up from 12 Illumina libraries per technician and week to more than 1,000 Illumina libraries per week, and with "a minimal capital purchase," it could generate more than 3,800 Illumina libraries per week by early next year.
In terms of multiplexing, more than a hundred samples now run in almost every 454 run, Lennon said. The cost per 454 sample has been reduced up to 40-fold, while DNA input requirements for a fragment library have been reduced from 3 micrograms to 100 to 200 nanograms.
Regarding the future of 454, he said, there will be "continued need" for small-scale sequencing that requires low coverage for many samples, and an increased need for medical resequencing — for example, to validate cancer mutations — and faster turnaround times. In addition, there will be a need for special libraries for genome finishing.
As for the Illumina platform, he said there will be an increase in demand for whole-genome shotgun libraries. The Broad will also implement barcoding and multiplexing for that system on a large scale. In addition, it plans to automate the construction of cDNA libraries in high throughput, and to work on specialized library construction from low amounts of input DNA.