Name: Elaine Mardis
Position: Co-director (since 2002) and director of technology development (since 1993), the Genome Center at Washington University School of Medicine
Associate professor, department of genetics, Washington University School of Medicine
Experience and education:
Senior research scientist, Bio-Rad Laboratories, 1989-1993
PhD in chemistry and biochemistry, University of Oklahoma, 1989
BS in zoology, University of Oklahoma, 1984
As co-director and head of technology development at the Genome Center at Washington University School of Medicine, Elaine Mardis is responsible for evaluating and bringing online new DNA sequencing technologies.
In Sequence caught up with her at the Advances in Genome Biology and Technology conference this month and asked her about her role and her outlook on the future of sequencing.
Give me a short overview of the Genome Center. What is its mission, and what are some of the most prominent projects going on right now?
The general mission, as we try to describe it, is 'Genomes to Health.' Because we are at a medical school, we really want to, as much as possible, bring the power of genomics to solving various health problems.
In terms of the major areas of focus, cancer is clearly one of those. We want to be as comprehensive as we can be about characterizing cancer genomes. Another big focus is microbial [genomics]. Microbes influence human health in ways that even transcend the number of lives that are impacted by cancer on a yearly basis. It was such a great thing to be able to bring George Weinstock to the Genome Center at the end of '07, because he just has a wonderful background in microbial work.
With our [National Human Genome Research Institute] funding, we have begun to look into what we call medical genomics. This is really starting to knock away, with genomic technologies, at the complex diseases that fall outside of cancer. We have some active projects going on right now, for example in metabolic disease, and we just recently were assigned a project in age-related macular degeneration.
But we also will continue to do the de novo genomes. There is a lot on our plate for that in terms of some primate species, many of which are model organisms for various kinds of human diseases. So there is a clear context in which those genomes can now begin to have an impact on the health of humans as well.
Was the platypus genome last year sort of an outlier?
It falls under the de novo genomes that help us better understand the annotation of the human, believe it or not. There is a lot to be learned from these sorts of outlier species in terms of how well we have things annotated [and] how different they are from us.
How is the Genome Center equipped with sequencing instrumentation?
We basically have the eight Titanium 454 instruments in place, and then we have just scaled up on the Illumina sequencers, so over the course of the next few months, we will end up having 35 of those on hand. Obviously, we have some 3730s that we are still running because of the various projects that we have going on that still involve large-clone sequencing. Some of those we are trying to move over to 454 as well, to varying degrees of applicability of the 454.
You are both co-director of the Genome Center and head of its technology-development group. How do you evaluate new sequencing technology?
We usually follow the same series of steps. We usually assign one or two people to the given technology, and they really dig in deep with the technology provider to become as expert as possible. In the past, it was really sufficient to have a couple of people with biology experience. That's really not the case anymore — you have got to have at least one person with knowledge in informatics/bioinformatics in the mix, just because of the informatics overhead on these sequencers.
[ pagebreak ]
There are multiple levels [at which] we have to evaluate the sequencer. There is the process by which the DNA is prepared — call it library construction — and that's evaluated at multiple levels, including success vs. failure, how hard is it. Probably a sub-component of that is, 'What's the amount of effort that's required?' Because at the end of the day, effort really translates into money: The harder people have to work, and the longer it takes them to prepare it to put on the sequencer, the more money it costs us. And part of that component, too, is, how much additional equipment might be required, whether there are ancillary pieces that we absolutely have to have. More recently, the other component that we think about as we scale up on these sequencers, and the number of libraries, is, 'What's the automation potential?' But I would say that's kind of a secondary concern.
Beyond this process to get to the point where you are ready to generate sequence, there is the sequencing process itself. Some of the components that come into play there are, for example, 'How robust is the instrument built?' Also, [what comes with] the hardware is software, so we have to evaluate, 'What's the quality and performance of the software, how well does it interact with the hardware, and are there problems?'
And then, downstream of data generation, and probably more specific to next-gen sequencing than it has been in the past, 'What's the data overhead? What's the amount of data that's generated per run, how intelligent are the designs for how that data amount is dealt with?' Things along those lines. And that really goes to what the compute overhead is that we are going to take on for each run of the instrument. It's really interesting with regard to next-gen, because all that downstream stuff — in terms of the computational [equipment] and file sizes and that sort of thing — has really become a part of the cost equation. When we first got started, we were estimating that on some sequencers, the cost of the sequencer was approximately equivalent to the cost of all of the extra computer hardware that we were going to have to take on, just for that one additional sequencer. I don't think that with evolution of the sequencers it is still a one-to-one proposition, but it's still a pretty large component of the cost equation, and I think that's a part that people kind of tend to not think about, but we think about it a lot, because it is such a large overhead.
We tend to get sequencers at different points in their readiness, or their approach toward commercialization. So if we are getting a sequencer in fairly early in that process, that's certainly something that we are equipped to deal with, and that we are used to dealing with. But part of what we try to watch really carefully is, 'What's the trajectory of development from the company to us?' Are we still sitting, six to eight months out, with a basically unchanged piece of equipment, or has the feedback that we provided, and what other beta testers have provided, led to any kind of substantial or tangible change? And that could be at any level — if the library process is difficult or prone to failure, has that been addressed? Sometimes, we'll address it on our own, but it's always good to see the manufacturer contributing to that if that is an issue. Similarly, with aspects of machine robustness and reliability, the software interface can often be very challenging to begin with, because those are commonly the things that are slower to develop. But if there isn't a development trajectory, then that's kind of a big, red flag.
Finally, the way we have always done things at Wash U is to have a technology development group that is physically distinct from our production entity. We have never intermingled those two because we have always felt that at the level we are testing things, there are things that are going to be longer in the development phase, and then there are things that are going to phase in earlier. But we would never want to disrupt the actual production of data with a new technology by not vetting it ourselves within this distinct entity that is technology development.
I guess our last sort of acid test, if you will, is, when we really feel that we have worked out all the kinks, and to some extent, the technology is stable — and that's always been a bit of a judgment call because sometimes we think things are stable but they are still at a point of transition but we need to get them into production, so we have got to anticipate that rather than wait for it — then our last point of crossover is this transition into production. We always do that very carefully. For example, with each of these next-gen technologies, as they have transitioned into production, we have begun by nucleating a small production group, training those people who are going to be working with the instrument in the various processes that are involved in running the instrument, as well as making the libraries, et cetera, and then letting them begin to produce data and evaluating that data very carefully before we go to more people and more instruments. It's really a lot, in some ways, like running new assembly line in a factory. You really want to make sure that the product you are putting out is comparable, can be QC-ed. And that's a paradigm that's worked well — we have used it, basically, for every major instrument system over time, and we will continue on that trajectory, as well. I see no reason to walk away from it.
Where do you see the most exciting changes in sequencing technology coming from, and what kinds of biological questions will they be able to answer that cannot be answered with the current technology?
Clearly, the move towards the single molecule — [Pacific Biosciences] for example — is the leader. You know I'm on their scientific advisory board.
[ pagebreak ]
Even in the last eight months, Illumina has had some major breakthroughs that [are] solving the read length [challenge] at ultra-high throughput, as opposed to medium throughput nowadays – everything's relative – that we are getting out of the Roche/454. [Roche/454] started with longer read lengths, they have continued to longer read lengths, and probably they have some plans somewhere that I haven't heard about for a more highly parallel machine. But nonetheless, neither that technology nor Illumina's addresses this single-molecule space.
I heard a number of talks at this meeting alone that could fit very nicely into that single-molecule sequencing space, for example to detect sequence variants that are not PCR-introduced. I'm thinking specifically [of the talk by] Mike Kozal [from Yale University] regarding the different viral variants and quasi-species [of HIV]. And along those lines, I think [the technology] really throws wide open the possibility that we might start teasing apart various aspects of infection.
The hope is that we will be able to drive the systems to where we can start not by sequencing one species in isolation, but sequencing that species, and sequencing the biology that it's impacting. And the other exciting thing about this is, how quickly does that transition into clinical diagnostics?
I worry sometimes because people think that I think that sequencing is sort of the answer to everything. I really don't; I hate to give that impression. I see it as opening up a huge discovery potential. That's really what we [have] to have as an important first step to all of the things that are going to come downstream.
But at the same time, I also worry about whether people who can have an impact on these sorts of things are really looking further ahead, because this is all going to start coming pretty fast and furious, and what I don't think we have in place in many of these systems – cancer, for example – are good, high-throughput ways to assign functionality and importance to the various types of variants that we are going to find, and really couch those in a biological context that says, 'This is important,' it's a driver, if you want to use that word, 'this is a target for drug therapy,' and things like that.
We have got to get biologists, pharmacologists, people that are interested in developing new antimicrobials, engaged, and really start identifying, through a variety of mechanisms, what the ways are that we can scale up all the downstream biology. Because otherwise, it's just going to be interesting factoids that we collect about sequence, and it will take everybody else 50 to 100 years to pick up on what we have learned and really turn it into knowledge. So it is encouraging to have people here from other areas and other disciplines that are coming to the meeting for the first time, that are kind of recognizing that there is a potentially huge impact, and they want to be a part of it.
What will large genome centers look like in the future? Will sequencing become more decentralized?
I think the demographic of our center has shifted over time to be more on the analysis side. And that's because our ability to produce data is so much greater than it ever has been that the emphasis has to shift to the downstream analysis, even though the production of data is still really important and still requires a lot of exquisitely detailed training and attention to detail.
In our center, having been part of the scale-up in 1999 for the human genome sequencing [project], we really bulked up in terms of the number of people who are dedicated to, and just producing, the raw reads that were going into the human genome assembly. Since then, the shift has been to fewer production people, more bioinformatics people, more informatics people, and to some extent, a few more technology development people as well.
In terms of the sequencing being more decentralized, I think there will be some of that; I think that's what kind of makes it neat, in a way. I don't really worry about running out of things to do. But I do think that more scientists will begin to avail themselves of the power of sequencing, whether it's through advanced facilities, whether it's through some of the sequence vendor providers that are springing up here, there, and everywhere, or whether it's through collaborations with genome centers. I think that there is plenty to go around, that there are plenty of cool questions to answer, and that, really, it's just a matter of how you like to operate.
Complete Genomics made its debut at this conference. Where do you see that company's role, and can you imagine outsourcing some of your human genome-sequencing project to a service like this?
I can't really provide them with a role right now because they are a complete mystery. [They should] provide datasets for people to look at, then I'll know whether there is a role for them. We had these conversations with them when they were in our shop. There is an issue that they haven't necessarily dealt with, which revolves around privacy and IRB and sample fidelity, that I think they are going to have to broach. I think they are working on it — they are not stupid people — but it has got to be solved. And quite honestly, there may be issues at different institutions with samples going off-site.
In terms of outsourcing, would we do that? I guess it comes down to whether you take on more than you can do. I have contributed to developing over the years, along with other talented people at our center, a QC-driven process that we have to follow. So I'd have to become intimately comfortable with whatever QC/QA processes are involved there, and I don't really have a feel for that.