Senior research scientist
At A Glance:
Name: Jared Roach
Title: Senior research scientist, Institute for Systems Biology
Educational Background: 1990 BS Cornell University, biology
1998 MD, University of Washington, Seattle
1999 PhD, University of Washington, Seattle, immunology
Jared Roach, a scientist at Seattle's Institute for Systems Biology, is involved in a project to integrate transcription-factor messages obtained from both microarrays and massively parallel signature sequencing (MPSS) into a comprehensive database of transcription factor expression in humans.
To gain this global view of transcription factor transcriptomics, Roach and fellow researchers at the ISB decided to explore the altered expression of macrophages following stimulation by lipopolysaccharide, an endotoxin that triggers a cellular response similar to many known pathogens.
Roach claims to have played a role in the development of the widely known method of whole-genome shotgun sequencing, and in his newest pursuit, he has embraced using technologies that are often thought of as rivals — Solexa's MPSS and Affymetrix's Gene Chip — and sutured them together for more successful experiments. After he presented some of the yet-to-be-published data from the ISB project at the Northwest Gene Expression Conference held in Seattle two weeks ago, BioArray News spoke with him to learn more about his work at the ISB, and why he believes that two technologies are better than one.
Maybe you could tell me about your background and how you got involved with the Institute for Systems Biology.
I was an undergraduate at Cornell University where I studied microbiology and chemistry. And I had long been interested, even prior to college, in understanding neutral regulation in gene networks, transcriptional networks — in [short], how cells think. One of the important [ideas] in cellular logic is how the [elements] fit together — and a metaphor for that is thinking of the cell as an integrated circuit, imagining that there's some circuit pattern that's ultimately knowable. To some extent my research grant and goals are related to improving our understanding of cellular logic, or towards that vision. I pursued an MD PhD at the University of Washington, and my PhD mentor was Leroy Hood. And during my PhD I studied a number of subjects, including how to sequence the genome. I was [also] the originator of a modern strategy, pairwise end-sequencing, which has come to be known as whole genome shotgunning — and most microbial genomes today have been sequenced using that methodology.
So subsequently to that I did a year of residency at the University of Utah in internal medicine. And then I returned to Seattle to help form part of the core computational biology group at the Institute of Systems Biology and this was enabled in part from my previous relationship with Lee Hood at the institute.
Tell me more about the project you are working on.
I am currently working largely with studying macrophages. Macrophages are a great mammalian model system for studying cellular logic. In some instances, the most advances in detailing cellular logic are going to be made in prokaryotes and ultimately in simple organisms like yeast. But as a medically oriented person with a goal that is basically trying to tie cellular research to the bedside, ultimately the clinician is drawn towards working on a system which is as medically relevant as possible, in my case a human system with insights for mouse systems.
A nice thing about macrophages is that, as part of the human organism, they are among the cells which are most representative of their cellular logic function when they are isolated, as compared to many cells that are part of tissue organs. So, we can gain a lot of insight which is perhaps more translatable to the reality of a physical function in a cell that's part of the hematopoietic system and blood-borne cells. But within that context I am not only trying to understand the basic biology of macrophages, but trying to develop the techniques, largely computational, for using the high-throughput tools we have available today to demonstrate in general how we can do systems biology, and in the case I talked about [in Seattle], transcriptomics in cells.
But why specifically is this important?
Well tuberculosis is maybe the number one, or number two, infectious disease killer all over the world, together with AIDS. And vaccines are arguably the most important part of preventive medicine, which is by far the most important part of medicine we have as clinicians. And so working in a medically relevant system that's related to developing basic science related to both vaccine development and tuberculosis prevention and ultimately treatment, it's relatively hard to find a more important clinical area to be working on right now. Not only that, but the techniques and the tools that I have been using and developing for many systems should be applicable for a variety of systems.
Are the tools that you are using being used by a variety of researchers right now, or are you breaking these newer technologies in with your research at the ISB?
I think at the Institute for Systems Biology we are both collaborators — we believe in using what's available — we believe in breaking new ground and leadership, both in terms of technology development and also in terms of tools for computational biology development in areas where it's necessary. And that's sort of a paradigm for my research, where it's largely driven by biological questions and as I come across questions and needs for which there are either no algorithms or inadequate computational biology tools. An example of that would be in MPSS where there are essentially no statistical tools available to evaluate the significance between measurements. So we developed a statistical model and that was published earlier this year.
Where was that published?
It was published in the Proceedings of the National Academy of Sciences, and it was developed by Solexa, IBM and Gustavo Stolovitzky's group, which has recently been renamed the Functional Genomics and Systems Biology group.
What role did you play in this?
I co-led it along with Gustavo. We worked closely together with that statistical model.
And during your presentation you said that you weren't only using MPSS, but Affymetrix GeneChips as well. Is that true?
Yes, one of our goals was to do comprehensive transcriptomics at sensitivity that is unprecedented for anything other that QT-PCR. PCR is expensive and time consuming enough that it's not really comprehensive technology — you can't really use it to study all the transcripts in a cell.
So we picked MPSS and Affymetrix as two orthogonal technologies, orthogonal in the sense that they're sort of technologically different and bioinformatically different in terms of how they assay transcripts. [We used] a combined approach which utilized the data that we acquired from both of these technologies from macrophages that were stimulated with lipopolysaccharide, which is a classic innate immune stimulus, to study the timetable transcription responses to LPS.
The MPSS data in particular allowed us to quantify the transcriptional response. With MPSS we could see, for example, the exact level of transcripts per cell so we could say that, 'this transcript is present at three copies per cell.' With hybridization array technology, the less you have previously calibrated the response of probes and probe sets, you can't really tell much of the transcripts you're looking at.
You see, if you do two separate arrays you can look for the difference, but you don't know necessarily whether you are looking at a relative difference between 3,000 copies per cell compared to 1,000 copies per cell or looking at a relative difference of 3 copies per cell compared to one copy per cell. And so we were able to calibrate the level of transcripts for most messages that we were interested in using MPSS data.
Which technology did you find superior for your work?
In some sense a lot of people are naturally led to ask, when they hear that we used both technologies, 'Which is better?' or 'Compare and contrast them,' thinking that at the end of the day the goal is to pick one. But I think my ability to do research using both of these technologies would probably be about 10 times better than it would have been if I had used either one alone.
I think the secret in the closet for both of these technologies, and for that matter every technology that's out there, is difficulties in bioinformatics interpretation of the data. And these difficulties far outweigh any difficulties relating to any aspects like technological innovation, or chip density, or sensitivity. If at the end of the day you don't have a probe — you don't know where it is or what gene it corresponds to, or its someway misannotated, that's worse than bad, it may be misleading. And so for different reasons both of these technologies have these large bioinformatics gaps and they are largely different gaps. And so if each system is blind to ten percent of the genome, together you see 99 percent of the genome.
The strengths and weaknesses aren't related to the fundamental aspects of the technology, so, for example, in principle, if an Affy probe set is mis-annotated -you go back and fix that. If a signature in MPSS is not mapping to the genome because of a poor annotation of the genome, if you go back and carefully look at the genome and re-annotate the genome, you can fix that. Both technologies have considerable strengths, but using them together in a synergistic manner I think is state of the art right now. I don't think we have a single, killer app transcriptomics technology.
Why do you think that the bioinformatics side of the applications are lacking?
I think it's because it's hard. It's also because a lot of technology companies do not have the capital resources to spend on bioinformatics. And I am not necessarily saying that venture capitalists should turn around and start funding that. It may be that bioinformatics is best done by academics.
I think frequently when companies do create software they are often misguided in creating proprietary, not open-source software, which ultimately academics cannot use or cannot improve and it may end up in a dead end. The other thing is that I think it just takes time. And time means years of use of a product to understand the statistics. And the product development cycle is such that products are outdated after they've been used a couple of years. So I think it's going to take awhile before bioinformatics really catch up. In the meantime, we use a number of other ways of getting around an issue, and one of them is this duplication issue.
And you say, 'Well that's very expensive using two different technologies' but, in the end, if you sit down and figure out how expensive it would [be to hire] twenty bioinformaticians at $100,000 a year, it may be better off spending $100 on a chip. Maybe that's an exaggerated example, but a lot of peoples' whole grants just do not provide funding for data analysis — adequate funding. If you write a grant that says, 'I'm not going to generate any new data, I'm just going to analyze data that already exists,' the grant-reviewing committees in general are just going to shoot it down. So that's a pervasive issue that I think we're facing in this decade.
Some people might be hesitant about adopting this two-technology approach, though. For example, I haven't met that many people that are using MPSS, in comparison to those who are running Affy arrays.
I think one of the reasons that you don't see a lot of people using MPSS is that Lynx, the former owner, did a poor job of marketing it. I think the awareness with MPSS was not in the field very well. Secondly, it's very expensive on a per- sample basis compared with an Affymetrix chip, maybe even 10 times as expensive. So I think a lot of researchers are overwhelmed with the amount of data they have. They don't need a comprehensive analysis — in principle they'd like it — but they can only think of 10 percent of the genes or one percent of the genes on the array. So they are plenty well occupied even if they are not seeing a comprehensive view of the transcriptome. Now, one of the goals of my most recent study was to see exactly how far we could push, in terms of sensitivity, the power to detect changes in transcripts. If people really want to see the changes in all transcription factor messages, they probably are going to need to use a technique like the one we used. I'm not saying that they would have to use MPSS and Affymetrix, but I would encourage them to use at least two technologies, that are ideally orthogonal in the sense that they are not based on different biochemical principles and they have different statistics and maybe different bioinformatics principles.
You mentioned that some of this was published earlier this year. Will you publish more in the future?
What was published earlier this year was our statistical model, not our data. So we'll be publishing our data soon.
How soon is 'soon'?
Soon. With scientific journals you never really know until you finally get an accepted paper and a lot depends on what reviewers require.
Are there any other studies you are working on using this 'two-fisted' approach using MPSS and the Affy GeneChip?
The major projects that we are working on here at the Institute for Systems Biology are cancer-related, particularly understanding prostate cancer, and also we are working on understanding Type I diabetes, which is research that is funded by the JDRF — Juvenile Diabetes Research Foundation. We are using this approach in both of these studies.