By Julia Karow
Genome centers large and small have added to their fleets of sequencers over the last year, and are applying their capacity to a variety of high-throughput projects, many of them disease-related.
Earlier this month at the Advances in Genome Biology and Technology conference in Marco Island, Fla., several centers — including the Broad Institute, Baylor College of Medicine, the Wellcome Trust Sanger Institute, Cold Spring Harbor Laboratory, the Yale Center for Genome Analysis, and the NIH Intramural Sequencing Center — provided updates on their current sequencing capacity and how they are putting it to use.
The Broad Institute traded many of its Illumina Genome Analyzers for HiSeq 2000s last year, and currently has 48 HiSeqs, 29 GAs, and two 454 GS FLXs in production, according to Tim Fennell, a member of the institute's genome sequencing platform team.
While the Illumina machines are mainly used for whole-genome resequencing and de novo sequencing, exome and other targeted sequencing, ChIP sequencing, and reduced representation bisulfite sequencing, the 454 instruments serve to sequence viral, fungal, and microbial genomes and for metagenomic projects.
In addition, the institute has three Life Technologies SOLiDs, four Ion Torrent Personal Genome Machines, one Pacific Biosciences RS, and one Helicos BioSciences sequencer that are not currently used in production.
The Broad has automated its library production, allowing it to capture and construct 800 to 1,000 whole-exome libraries per week, along with about 200 whole-genome libraries, almost 100 long mate pair libraries, and a small number of custom libraries, according to Fennell. All of these are constructed with indexed adapters now, he said, and 70 to 80 percent of sequencing runs are now multiplexed.
According to Sheila Fisher, assistant director of technology development of the Broad's genome sequencing center, the institute had performed 14,000 exome captures as of last September. It uses Agilent's SureSelect method for that, which it helped develop.
Broad researchers led by Andreas Gnirke have also improved the GC bias of Illumina libraries, which causes reduced coverage of genomic regions with low or high GC content, by optimizing the PCR conditions during the PCR-enrichment step. The new protocol is "almost as good as PCR-free library prep," Gnirke said, while at the same time "more forgiving and easier to automate."
Another source of GC bias in Illumina sequencing has been the cluster amplification, which under-represents GC-rich sequences, but a new kit that Illumina plans to introduce this spring "very much improves" their coverage, according to Chad Nusbaum, co-director of the Broad's genome sequencing and analysis program.
Though very little of the human genome is extremely GC-rich or –poor, those regions "contain a lot of important stuff," he said.
One area in which the Broad has put its sequencing factory to work is cancer, where the goal is to discover new cancer genes and pathways, molecular classification schemes, and, in the long term, therapeutic targets, he said.
The analyses include whole-genome sequencing, exome sequencing, transcriptome sequencing, and epigenome sequencing, Nusbaum said, although the institute has mainly focused on the first two so far.
Since starting next-gen sequencing of tumors in 2009, he reported, the institute has sequenced the genomes of about 150 tumor/normal pairs at 30-fold average coverage, and about 750 tumor exomes at 150-fold average coverage. So far, the researchers have found that the number of point mutations differs between cancer types and is especially high in UV- or smoking-related cancers. They have also seen a wide range of rearrangement patterns across and within tumor types. In multiple myeloma and prostate cancer, they have found new pathways and genes not formerly known to be involved in these types of cancer.
[ pagebreak ]
Whereas the depth of exome sequencing provides good sensitivity to detect mutations in impure, polyploid, and complex tumor samples, whole-genome sequencing provides structural information, Nusbaum said.
Even a "modest" number of sequenced tumors have provided the Broad researchers with "significant insight," he said, and over the next few years, a "very large number of cancers will be sequenced."
The Broad is also involved in exome sequencing projects funded by the National Heart, Lung, and Blood Institute and others, for which it is studying several thousand samples, including early-onset myocardial infarction, extremes in blood pressure, type 2 diabetes, autism, and schizophrenia. For early-onset myocardial infarction alone, it has completed more than 1,100 exomes, and has 500 more in the pipeline, according to Stacey Gabriel, co-director of the Broad's genome sequencing center.
Baylor College of Medicine
Baylor's Human Genome Sequencing Center is currently equipped with 30 SOLiD 4 instruments, four HiSeqs, two GAIIx, five 454 GS FLX, and seven ABI 3730 capillary sequencers, according to Donna Muzny, the center's director of operations. In addition, it has had a PacBio RS machine since last July and received three Ion Torrent PGMs last month. Five additional HiSeqs are on order, according to HGSC director Richard Gibbs, and the center will receive another Ion Torrent machine that one of its employees, Adam English, won during a contest at the AGBT meeting.
Baylor devotes about half of its sequencing capacity to exome and regional capture projects and has completed sequencing more than 4,000 of these samples to date, according to Muzny.
At present, it can perform more than 1,200 captures per month using NimbleGen in-solution reagents. Its main targeted sequencing projects are the Cancer Genome Atlas, the National Institute of Mental Health's autism study, the Cohorts for Heart and Aging Research in Genomic Epidemiology, or CHARGE-S, study, and the 1000 Genomes Project.
For the CHARGE-S project, which aims to identify genes underlying GWAS findings by sequencing, the researchers are capturing up to 2.2 megabases of DNA regions, which they plan to sequence in 5,000 samples using the SOLiD platform. In addition, for a liver cancer and colon cancer validation project, they are capturing 3 megabases of genomic regions, which they are sequencing on the Illumina GAII.
For the NIMH autism project, they are sequencing exomes of 1,000 cases and 1,000 matched controls using NimbleGen SeqCap EZ exome 2.0 reagents and SOLiD. In addition, they have designed another NimbleGen exome capture set, called VCRome, which comprises about 25,000 genes and a total of 46 megabases.
Asked when the center uses whole-genome sequencing and when exome sequencing, Muzny said this is "a daily question we ask ourselves," adding that "cost-effectiveness and data utility" drive their decision.
Sanger, Cold Spring Harbor, Yale, and NISC
Other centers have also built out their sequencing capacity over the past year. The Wellcome Trust Sanger Institute, for example, now has 20 HiSeq 2000 and 25 GAII, with more GAs to be replaced by HiSeqs in the future, according to Harold Swerdlow, the institute's head of sequencing technology.
The Sanger also has two GS FLX instruments and a PacBio RS, which it is currently testing. Its total data output has increased from an average of 600 gigabases per week in 2010 to a current 2 terabases per week now, he said.
HiSeqs are used in projects like the 1000 Genomes Project, the UK10K project, and various cancer, mouse, and malaria projects, while the GS FLXs are mostly used in pathogen de novo sequencing.
[ pagebreak ]
Cold Spring Harbor Laboratory now has four HiSeq 2000, 10 GAIIx, one GS FLX, and a PacBio RS. Its facility's throughput has increased from "a few" gigabases per month in 2007 to more than 2.5 terabases per month right now, according to Dick McCombie, a professor at CSHL. It is expected to be "closer to" 4 terabases with the same number of instruments this spring, once Illumina upgrades the throughput of the HiSeq.
The institute uses about three-quarters of its sequencing capacity to study the genetics of cognitive disorders, he said, including schizophrenia, bipolar disorder, and depression. For example, it is currently sequencing the complete genomes of families "highly burdened" with schizophrenia.
It has also compared in-solution exome capture kits from Agilent and NimbleGen and has found that "by and large, both exome capture kits are quite reproducible in what they capture or don't capture," according to McCombie. The institute will choose based on cost which one to use in the future, he said.
CSHL's two other main areas of study are cancer genetics — including prostate, pancreatic, and esophageal cancer — and generating de novo genome sequences of plant genomes, for example the wheat genome.
According to McCombie, one of the "biggest challenges" that remains is to be able to sequence a small number of genes in a large number of samples. "Sample purification and barcoding are not nearly as robust as sequencing capacity right now," he said.
Also, researchers need to start thinking about "how we do human genetics when it costs $1,000 or less to sequence a human genome," he said, which he believes is likely to happen in a year and a half to two years.
A relative newcomer to large-scale sequencing facilities is the Yale Center for Genome Analysis at the Yale School of Medicine, which was established as a core facility in late 2009. It is currently equipped with seven HiSeqs, 12 GAIIx, and one GS FLX but plans to replace more GAs with HiSeqs, according to the center's James Noonan, a professor of genetics at Yale.
The center's current data output is about 3.6 terabases per months, he said. About 70 percent of its projects are targeted sequencing, followed by RNA-seq, ChIP-seq, small RNA-seq, and genomic sequencing.
One of the challenges is the center faces is that it has a "diverse user community" with both experienced and inexperienced users of next-gen sequencing data, making it important to offer customers training. It is also currently "severely understaffed" in informatics after having "clearly underestimated the informatics need," Noonan said.
The NIH Intramural Sequencing Center currently has one GS FLX, six GAII, and a couple of HiSeqs that it recently traded for two GAIIx, and which it expects to come online later this month, according to Jim Mullikin, the center's acting director. In addition, it has one 454 instrument that is primarily used for microbiome studies as well as microbial genome sequencing.
The center has so far generated a peak output of a terabase of sequence per month, but because the two HiSeq machines will be equivalent to 10 GAII in throughput, the center's overall throughput will double, he said, and will increase further with the upcoming HiSeq upgrade.
Exome sequencing using Agilent's SureSelect method is the center's primary application at the moment, and it has recently automated its library construction process.
It is involved in several exome-sequencing projects, including ClinSeq, the NIH's Undiagnosed Diseases Program, and a variety of PI-driven projects, for example in cancer.
Mullikin pointed out that the hardware life cycle of sequencers has become shorter, so a three-year amortization can no longer be assumed. Also, protocols from vendors have been changing frequently, demanding changes in sample tracking and the bioinformatic analysis.
He looks forward to better automated analysis pipelines for methods like ChIP-seq, RNA-seq, miRNA-seq, and whole-genome assembly. "Data generation is no longer the limiting step for most projects," he said.
Have topics you'd like to see covered in In Sequence? E-mail the editor at jkarow [at] genomeweb [.] com.