Center for the Study of Biological Complexity
Name: Yuan Gao
Title: Assistant professor, Center for the Study of Biological Complexity, and department of Computer Science, Virginia Commonwealth University, since 2006
Experience and Education: Postdoc, Genetics, Harvard Medical School (working with George Church)
PhD, Computer Science, University of Memphis, 2001
MS, Computer Science, University of Memphis, 1998
MS, Biochemistry, University of Tennessee, 1995
BS, Biology, Beijing University, 1992
Yuan Gao joined the faculty of Virginia Commonwealth University last summer after a postdoc in George Church’s lab at Harvard, during which he worked on software for Church’s polony sequencing technology. Gao’s lab installed an Illumina Genetic Analyzer in late March and will soon start putting it to use for a plethora of projects. In Sequence caught up with him recently to talk about his decision to acquire the instrument, and his plans for how to use it.
What were you working on in George Church’s lab?
My work was mainly computational and algorithm design. I was mainly doing gene expression analysis and motif discovery, but near the end, I was working on the algorithm side for software for the ligation-based sequencing technology — polony sequencing — that George’s lab developed. So I became very familiar with the technology.
How did you make the decision to buy the Illumina sequencer?
After I left George’s lab, my plan was to continue the high-throughput sequencing approach to study genomics. VCU is very supportive of this as it aligns very well with their vision, and history, of maintaining a leading edge in next-generation sequencing.
After some study of the available second-generation technologies, we invited [Applied Biosystems], 454, and Solexa to come and give talks last October. At that time, we were trying to decide which one to get. I am very familiar with ABI’s SOLiD technology, as it is based on George’s work on polony sequencing. The technology has the advantage of being very accurate, especially in homopolymer regions. However, at the time, ABI told me they were looking at a 2008 delivery for the SOLiD instrument, which is of course too far away for me — I needed to spend my money. So I was looking at the other two available platforms, 454 and Solexa. Actually, I consulted many experts, including George Church, who hooked me up with some people who know the technologies well. As our core facility was also planning to upgrade [its] sequencing platform, we invited [the companies] to come and give presentations. I was impressed with the Solexa, and I also heard good things about them, so we decided to purchase a Solexa instrument.
The current read length of Solexa is short — 35 base pairs — and thus well suited for re-sequencing, gene expression analysis, and microRNA studies. Because my lab is mainly interested in these applications, this was a good decision for me.
At the same time, our core facility is more interested in de novo sequencing projects, so we decided to purchase a 454 GS FLX. The two instruments have two different sweet spots. 454 is better at de novo sequencing at higher cost and lower pace, and Solexa can do mainly resequencing and gene-expression profiling and microRNA analysis. By ordering a Solexa for my lab, we [now] have complementary technology platforms. Our core also purchased an ABI 3730, and we are paying close attention to ABI’s SOLiD platform.
VCU has made it clear that we will maintain [our position in the sequencing arena] by acquiring the latest technology when it becomes available.
Where did the funding for your instrument come from?
From my startup funds, and also the university has chipped in a lot of money.
Is it true that the instrument has high computational needs?
It is, yes. Luckily, our Center for the Study of Biological Complexity is already an interdisciplinary place, and we have a high-performance computing center on the first floor of the Trani Life Science building, which is where my lab is located.
Once Solexa gave us the on-site preparation menu, we looked at the computation need, and I said, ‘Gee, it needs high storage.’ So we actually ended up spending another $35,000 for a Sun 24 terabyte storage high-performance machine. We also bought an 8-node computer specifically for this machine, so we can store the images, because one run will generate about one terabyte of data. We will have to get the one-gigabyte connection between the Solexa machine and our downstairs storage machine. The high-performance computing center also recently purchased a 596-node Linux Beowulf cluster, which includes 17 terabytes of internal disk storage, and I have access to that as well.
What are you planning to use the instrument for?
I have my plate full. We have quite a few projects already lined up.
We are interested in whole-genome resequencing. The one I am especially interested in is the [rhesus macaque] monkey genome, which was published last month in Science by the Rhesus Macaque Genome Sequencing and Analysis Consortium, also known as the Monkey Genome Consortium. Right now, we have the human, chimpanzee, and now the monkey genome. We want to identify what’s common between us, and what’s different. Since we already have the reference monkey genome, I think the immediate thing we want to do is resequence more individual monkeys using the Solexa platform. Monkeys have been used in studies for HIV, and for flu, and it’s a primate, so I think it has high potential. In the Science study, they only resequenced a small region of 16 monkeys, eight of Chinese origin and eight of Indian origin, which means there is lots of work to be done. So that’s something that is a little bit longer-term for me, but I think I will commit myself to that project.
Also, here at VCU I am collaborating with Professor Steve Fong from chemical and life science engineering on another, more longer-term project involving microbial genome sequencing. We are trying to engineer microbes that show high efficiency in conversion of cellulose to biofuels, like ethanol. Then we are going to sequence them and identify the genes, and the mutations in the genes, that are responsible. Our goal is to identify the most suitable microbe for biofuel production through genome sequencing by informatic analysis and computational modeling. Another goal is to engineer the selected microbe for biofuel production using computational strain-design algorithms and genetic engineering techniques. Solexa sequencing is very suitable for this because we already have a reference genome.
Another project is sequencing cancer samples versus normal ones. I am going to collaborate with Professor Carleton Garrett from the pathology department at VCU. They have different cancer samples, and we are going to use a similar approach as the one used by a group at Johns Hopkins who published their results in Science last fall to identify SNPs, mutations, and new cancer genes.
I am also working on a hybrid approach for de novo sequencing, leveraging the deep coverage of Solexa and the long reads of the 454 FLX. Currently, I am collaborating with Professor Greg Buck at CSBC and the microbiology department at VCU on sequencing Phytomonas and T. dionisii. We have already gotten five runs of 454 on Phytomonas and will do a run of Solexa this week. Of course, the next thing is to figure out how to best use the data from the two platforms. We are doing research in this area, and I am sure people around the country are thinking of the same thing, what is the optimal mix of reads to reach the goal of de novo sequencing with acceptable quality.
I am also working on sequencing the yellow perch genome in collaboration with Professor Bonnie Brown at the biology department at VCU. We will also use a hybrid approach: we will use a few runs of 454 to provide some assembly, then we will use Solexa for deep coverage.
With Professor Jun Zhu at the Institute of Genome Science and Policy at Duke University, we are going to study alternative splicing and microRNA. Instead of using a microarray, we are trying to use Solexa sequencing. We want to know which isoforms are expressed in different tissues, normal and diseased, or stem cells versus differentiated cells, and we also want to correlate specific isoforms with disease or behavior. Using a microarray, it’s hard to quantify the expression level of different isoforms. But using the Solexa platform will give us absolute numbers to compare how many copies of this isoform are expressed, and how many copies of that isoform are expressed.
A study of microRNA is another collaboration with Jun. Right now, we want to identify some new microRNAs in a cell line that produces insulin when stimulated with glucose. We also want to ask the question, ‘Is there any negative feedback loop to help stabilize expression levels that uses the microRNA mechanism?’ This is much cheaper [on the Solexa] than, for example, SAGE and MPSS. Using the Solexa, we can actually do about 5 million tags per sample for only a few hundred dollars. We have eight channels, and in each channel, we can do a sample.
We also want to study epigenetics, that’s one thing I am very interested in. We want to do whole-genome methylation studies. I am planning to work with Billy Li from George [Church’s] lab on this in the near future. Of course, we are also interested in chromatin immunoprecipitation sequencing to identify gene regulatory motifs.
Another project I want to mention is a parasite we want to sequence in collaboration with Professor Ghislaine Mayer at VCU. We are going to sequence Plasmodium ovale and Plasmodium malariae, two parasites that cause malaria. They are closely related to Plasmodium falciparium, but not much has been known about them.
I am also collaborating with George Church at Harvard on his Personal Genome Project and will contribute the Solexa platform to this.
I think the possibilities right now are endless. The technology really opens a whole new horizon, I think.