Associate Director, Senior Group Leader, Genome Technology
Genome Institute of Singapore
Name: Yijun Ruan
Position: Associate director (since 2003) and senior group leader (since 2002), genome technology, Genome Institute of Singapore
Experience and Education:
— Director, core genomics and applications, Large Scale Biology, 1999-2002
— Senior scientist, genome technology, Monsanto, 1996-1999
— PhD in molecular biology, University of Maryland, 1994
— BS in microbiology, Huazhong University, China, 1983
At the Genome Institute of Singapore, Yijun Ruan and colleague Chialin Wei and their team have been using their paired-end ditag library preparation method in combination with second-generation sequencing for applications ranging from transcriptome sequencing to cancer genome analyses.
The institute currently has in-house six Illumina Genome Analyzers, four ABI SOLiD systems, and one 454 Genome Sequencer FLX.
In Sequence met with Ruan during the Personal Genomes conference at Cold Spring Harbor Laboratory this month to discuss his work.
Tell me about the Genome Institute of Singapore.
The Genome Institute of Singapore was founded in 2001 with funding from the Agency for Science and Technology Research, a government organization. It occupies a seven-floor building in the Biopolis, which is probably the largest biomedical research hub in Asia. Right now, the institute employs around 300 people.
When did you join the institute, and what kind of research do you conduct?
I came to the Genome Institute of Singapore in 2002, soon after it got started, and the clear mission was trying to set up a strong DNA technology program. We started with full-length cDNA cloning and sequencing — the goal was to characterize the transcriptomes of cancer cells and stem cells. But we realized that cloning and sequencing full-length cDNA was just too cumbersome, so we thought about a better way to get to the same result in a much more efficient way.
We initially thought about using a tag-based sequencing strategy, such as serial analysis of gene expression. But SAGE is not good enough, because it will only map a short piece of each cDNA to represent the entire molecule. I thought that if we could map the 5’ and 3’ tags to represent a cDNA, that would be a much better way, combining, in a way, the tag-based approach with the full-length cDNA sequencing approach. That’s how we came up with paired-end ditag, or PET, sequencing.
We started to characterize transcriptomes, but we realized that our approach is not only limited to cDNAs. It is actually much more broadly applicable to other DNA fragment analyses, such as analyses of chromatin immunoprecipitation-enriched DNA fragments to study transcription factor binding sites. That’s how we established ChIP-PET.
We actually set the precedence and pioneered this entire field. At first, ChIP-chip was the dominating method in this field, and lots of people did not believe that sequencing can actually do this, no matter how deep you sequence. Our lab really demonstrated — not only in the wet lab but also by our bioinformatics analyses — that you can precisely map any site using sequencing.
Initially, we used capillary electrophoresis sequencing to analyze concatenated PETs, so with each read, we got 20 or 30 PET units. That’s relatively expensive, but it can be done. Then, when the 454 method came out, we adapted our method to that. The real explosion happened when the Solexa platform came out. It just increased the depth to tens of millions instead of just hundreds of thousands of sequence reads per run.
And then we thought about how we could further push the envelope and do something that is more unique to our approach. That’s how we came out with a method to study chromatin interactions on a genome-wide basis, called ChIA-PET. It was quite difficult to establish — it was probably the effort of about two years to really work out all the details. I think this really represents the future in the field of studying how proteins interact with the genome to regulate genome functions, both transcription as well as DNA replication.
We are also participating in genome sequencing projects, as well as cancer genome sequencing projects, in both cases utilizing the advantages of paired-end ditag sequencing.
The genome architecture information we obtain with our method will have an impact in two directions. One is, it will have immediate clinical value. For example, it can help physicians determine before birth whether a child has a normal genome structure. Another example is cancer: the structural characteristics of an individual’s tumor can help physicians detect whether residual tumor cells are still present or whether the tumor has been completely wiped out following cancer treatment.
The other direction is in personal genome sequencing. If a new sequencing method comes out, like Pacific Bio’s, then everything will be changed. But if not, my belief is that current technology can provide personal genome sequencing. It’s just a matter of reducing the cost and making it more efficient. And then, our paired-end ditag sequencing approach will play a quite important role, besides the role it will play for the ChIA-PET analysis.
How are you equipped with sequencing systems, and what do you use each platform for?
Regarding next-gen equipment, we have one 454 GS FLX, six Illumina Genome Analyzers II, and four ABI SOLiD 2.0 systems. We have also kept four ABI 3730 units — we used to have 10 — for QC and development use.
I think the currently available three [second-generation] platforms all have their own strengths, and they all have their own disadvantages.
I think the 454 is a pretty nice system. If you go by time, say, per week or per month, it pretty much has the same output as a Solexa or a SOLiD. The only issue is the reagent cost. Compared to the other two platforms, I would say it’s on the order of 10-fold more expensive. That’s a big deciding factor — if for the same amount of money, I can do 10-fold more things, or if I can spend 10-fold less money to get the same job done, of course I would go that direction.
454’s strength is really in the several-hundred base pair reads, which make the system unique for metagenome sequencing. For example, we have a mosquito viral metagenome project, and I think 454 is uniquely suited to that, and no other approach can match it for this project.
The output of the Solexa and SOLiD systems is about the same. It’s all tag-based sequencing, and they both can do paired-end reads. Solexa came into this market earlier, so therefore it is relatively well-established, compared to SOLiD, which has kind of a catch-up game to play.
But besides that, beyond the technology, it’s really about the business. For example, we think ABI has much better tech support for the SOLiD than Illumina for the Genome Analyzer. That’s a factors we also consider before making a decision.
Application-wise, I think Solexa is more flexible than the SOLiD. If you want to test a few things, Solexa is a much better way to go, because you can do a one-lane test to see what the results are, and then you can decide what to do next. Solexa is also very good to feed ChIP-Seq studies, because you don’t need that much data — one lane can give you 1 or 2 million reads, which is enough to be informative.
The SOLiD is kind of bulkier, but if you are sure about something you want to do in a big way, for example for cancer genome sequencing, we think it’s a really good fit. That doesn’t mean that Solexa cannot do that — it’s just more flexible, so we use Solexa to do more diverse projects, and SOLiD is more dedicated to larger projects.
What kinds of studies are you planning?
We want to study the function and the structure of the genome. In terms of function, we are studying how the transcription machinery is regulated and coordinated. Most of the focus is on stem cells and cancer cells, and we think that ChIA-PET will be the main technical approach for us, because it yields everything that ChIP-Seq can provide, and much more. For example, ChIA-PET can also get [long-range] interactions, which is very valuable information no other technology can provide. Of course we also use other techniques, such as ChIP-Seq and transcriptome sequencing. But we are mostly focused on paired-end sequencing – that’s our strength, that’s our core competency.
In terms of genome structure, in cancer genome sequencing as well as personal genome sequencing, we also use the paired-end sequencing approach to push the idea of personal karyogenomics.
Where do you see the sequencing field headed, and what are the greatest challenges ahead?
By witnessing the recent developments of the sequencing technologies, you just cannot underestimate what might happen. Maybe next year, Pacific Bio may have some breakthrough. And also, Helicos, if its system works out, could provide significant incremental improvement to this field, simply because they could provide much higher throughput than the currently available platforms.
The greatest challenge will be the data analysis, and related IT improvements — how to manage the data flow, how to increase the data storage, and how to develop new algorithms.
And I think more and more, we will understand how structural variations in the human genome contribute to disease. … In the future, if everyone’s genome sequence becomes available — and every individual already has a complete set of phenotypes — that would be the ideal situation, the paradise for medical research.