The machine sits, unfinished, in a corner. Under six feet long, with three linked stainless steel chambers that could each fit comfortably in your arms, it's a toy compared to the massive metal-stamping and milling machines that dominate factories around the world. But in this shop, size doesn't matter. This is the TOF/TOF mass spectrometer, and it's the instrument that could catalyze the next phase of the life sciences industrial revolution.
Or not. Celera Genomics, the well-heeled upstart that produced a working draft of human genome in two years, is betting hundreds of millions of dollars that TOF/TOF and other complementary technologies will now enable it to generate the human proteome — the entire protein complement of the human genome. Achieving this audacious goal will require tools that do not yet exist or, like TOF/TOF, are not yet ready. And these tools could have limitations that no amount of money and technical genius can cure. Is Celera making a colossal error by betting its future on this machine?
Doing the genome — again
Recent history indicates otherwise. In late 1997, Applied Biosystems chief Mike Hunkapiller talked Craig Venter into heading up a new company that would use ABI's new Prism 3700 to sequence the human genome. Within three years, Celera had the draft genome in hand and a market capitalization of $4.2 billion (October 2000). Venter's personal net worth had surpassed $110 million.
In a stock offering last March, Celera raised $944 million that it said it would use for a "human protein database" and other functional genomics projects. The goal: the human proteome. "We'll be working through every tissue, organ, and cell," Venter told Science at the time.
Once again, a machine — this time the TOF/TOF — will make or break the project. At the speed its inventors predict, entire proteomes could be analyzed in a reasonable period of time, something unthinkable now. "If that instrument performs well," says University of Michigan proteomics researcher Phil Andrews, "it will represent a technical revolution in life sciences, as great as, perhaps greater than, the development of the DNA sequencer."
That's because proteomics promises to give meaning to the genome by revealing gene function—and mass spectrometry is the proteomics bottleneck. Tandem mass spectrometers can only reliably process a few peptides for identification in a given hour; robotic systems can separate proteins and supply their peptides, at the front end, much faster. Recognizing the eventual demand for a machine that could identify proteins on an industrial scale, researchers at PerSeptive Biosystems in 1996 set to work on TOF/TOF. By the time PE acquired PerSeptive in January 1998, the project was well underway.
Two men, more than anyone else, are the brains behind TOF/TOF. Steve Martin, a protein chemist by training, has been working on mass spectrometry systems for proteins since his days as a grad student at MIT in the early '80s. He moved to the Medical University of South Carolina and then Genetics Institute, before landing at PerSeptive in early 1994. "I decided I would have to stop complaining about instrument companies if I didn't take this job," he recalls. At PerSeptive, "I could complain … and try to change the paradigm."
It was Marvin Vestal, already a legendary figure in mass spectrometry circles, who talked Martin into joining PerSeptive. The two men are a study in contrasts. Martin, compact and youthful-looking at 43, is expansive, loquacious, and appears to enjoy dealing with the public and the press. As director of Applied Biosystems' new Proteomics Research Center, his role is organizational and managerial. The heavyset Vestal, 66, avoids publicity and is much more reserved, even professorial, in manner. He's the hands-on creative force.
When Vestal is not designing instruments he's at the workbench bolting them together by hand. "He's the kind of guy who can make almost anything work," says University of Michigan's Andrews, who has been using Vestal-inspired instruments for more than a decade. Vestal's age makes no difference, say those who know him. "The fact that he's in his mid-60s is not going to slow him down," says a former colleague Calvin Blakeley. "It's certainly not going to slow him down mentally. He's driven."
Vestal's career spans almost the entire history of mass spec for proteins, and his work has been pivotal. In the early '80s, as a professor at the University of Houston, he founded a company called Vestec to make mass spectrometers for biologists. At the time, mass spectrometry was almost useless for analyzing proteins because samples had to be converted from solid or liquid into gas phase, and heating proteins usually destroyed them. (Mass spectrometers identify molecules by sorting their gaseous ions according to their mass-to-charge ratio.) To make gas ions from liquid proteins, Vestal invented thermospray, the first effective coupling of liquid chromatography (LC) protein separation to a mass spectrometer — a big step forward for biologists trying to identify proteins.
Vestal didn't stop there. Although electrospray and MALDI (matrix assisted laser desorption/ionization) superseded thermospray as the ionization methods of choice later in the decade, it was around 1990 when Vestal made a lasting impact by coupling MALDI to a time-of-flight mass spectrometer to create the first commercial MALDI-TOF instrument. PerSeptive bought Vestec in 1993, and was acquired by PE five years later. Today MALDI-TOF is a fixture in protein biology labs around the world and Houston is gearing up to manufacture the TOF/TOF.
Andrews says Vestal "has an intuitive understanding of ion physics." He adds, "He also has a very grand vision of the future of mass spectrometry and proteomics."
At the 1999 Asilomar Conference on Mass Spectrometry, Vestal made a provocative case for a human proteome project. "I said something like, When you compare biologists to chemists and physicists, biologists are among the most ignorant people I know," Vestal recalls. "They really know quite a lot, but there's so much more to know. We're really on the threshold." By the end of that year, with the necessary tools in the pipeline, Craig Venter had committed Celera to proteomics.
Getting up to speed
There's a lot at stake: Despite its human genome sequencing accomplishment, Celera lost $92 million in the year ending last June, and losses are mounting. With its stock price down 70 percent from its March high, the bloom is off Celera, and investors will be looking for signs of future profitability. That means big deals with drug companies, which are seeking novel drug targets, not raw sequence.
Years ago, a small number of visionaries, Martin and Vestal among them, already knew that DNA and RNA alone wouldn't be enough to reveal gene function. Proteins were needed. And by looking at which proteins were expressed at what levels in normal cells versus diseased cells, they reasoned, one could infer the cause of a disease, find markers to diagnose it in patients, and discover targets for future drug treatments.
That's a lot harder than it sounds. Because the typical human cell contains thousands of expressed proteins at any given time, many of them differing according to pre-translational and post-translational modifications, the overall complexity of proteomics makes genomics look like child's play. "It has been estimated that the number of actual proteins generated by the human genome, which is all the proteins of the proteome, if you will, is on the order of 10-20 million," says Andrews.
Given the more than 250 different tissue types, millions of protein identifications will be needed to work out the biology of a single disease or pathology, and billions for a complete proteome. "We clearly need an increase in speed to get where we'd like to go," says Vestal.
Why are current methods so slow? The mass spectrometer's ion source is one reason. Electrospray "is limited by the rate at which you can pump liquid into the instrument," notes Martin. "And that's very slow today."
MALDI is faster. You put the protein sample on a plate covered in a matrix of ultraviolet light-absorbing material, hit it with a laser, and generate ions that fly off towards a detector. But although a lot of samples can be spotted on a plate, the laser must fire in discrete pulses, and so the speed is limited by the laser's "rep rate." At the moment MALDI lasers are fired at a maximum rep rate of 10 hertz (cycles per second). It takes about 30 seconds for the typical user to generate a single useful mass spectrum (at 10 hertz) off a sample spotted on a MALDI plate.
Thirty seconds may not sound like much time, but to identify a million proteins in a day, which is Celera's goal, it's far too slow, even employing dozens of machines at once. Martin and Vestal's goal for the TOF/TOF is to increase throughput by an order of magnitude each of the next two years — from a maximum of 100 samples an hour now to 10,000 an hour in 2002. They will do this mainly by pumping up the laser rep rate to 2,000 hertz. "People in the [Proteomics Research Center] have demonstrated that throughput in bursts already," says Martin.
Still, bursts in the testing lab are a long way from factory-grade automation.
Applied Biosystems' Proteomics Research Center is a cluster of small laboratories deep within a nondescript office building. The place has the feel of a startup. The building, just off the Massachusetts Turnpike in the Boston suburb of Framingham, still bears the PerSeptive name. In contrast to Celera's monstrous operation, which required the city of Rockville to update its transformer to avert a blackout, the Framingham center more resembles a university basic-science department.
But its goals are anything but academic. "What I'm trying to do in this research center is the work that will drive the [proteomics] field," says Martin. "In genomics, people buy kits. In proteomics, everyone is a garage shop." The center is creating systems to rationalize proteomics the same way cloning kits and sequencing machines rationalized genomics.
Individual labs labeled "Science & Technology" and "Organic Chemistry" line the hallways. In the "Media Research" room where engineering blueprints cover the walls, technicians toil at workbenches and giant spools of wiring compete for floor space with crates of new equipment. In the corner, workers are putting the final touches on the third TOF/TOF prototype.
Speed is the main reason Celera will use TOF/TOF. Vestal resurrected time-of-flight in the early 90s from the scientific dustbin. Invented in the 1950s, the concept is simple. TOF is basically an ion source, a piece of pipe, and a detector. The ions start off and fly through the pipe in a vacuum to the detector, which measures how long it took them to get there. Different masses have different velocities, so measuring their flight time at constant charge gives the mass-to-charge ratio. TOF was perfect for MALDI because it could handle the laser-pulsed ions that come off the MALDI plate all at once, as opposed to other analyzers, which scan ions individually and couldn't take advantage of the laser's speed.
But the trouble with TOF was always low resolution and poor mass accuracy. So in 1995, Vestal and Peter Juhasz adapted a technique called "delayed extraction" to TOF. By pulsing the ions off the plate, waiting, and then pulsing the ions again towards the vacuum tube, they aligned all the ions like racehorses at the starting gate, making the "finishing times" much more accurate. Now, says Vestal, "time-of-flight, particularly with MALDI, clearly wins in terms of the combination of speed and sensitivity."
In September, Applied Biosystems shipped its first beta model TOF/TOF from Framingham to Celera. A plant in Houston is already beginning to make commercial units for availability in late 2001 and Oxford GlycoSciences has agreed to buy a fleet of the instruments under early access. While the plant's production capacity is confidential, it's designed for high demand.
Presumably, Celera is committed to TOF/TOF as the mainstay of its proteome mapping project. Applied Biosystems officials declined to reveal how many of the instruments Celera has on order, and repeated calls to Celera executives for this story went unanswered. But others will also be able buy the machine. "Our goal is to sell our proteomics technology to everyone, not just Celera," says Martin.
To be sure, TOF/TOF is at an early stage; a 2,000-hertz machine does not yet exist. Long before it gets there, Martin predicts that "lots of bottlenecks are going to appear both in front and behind the mass spectrometer — as far as feeding it samples, and processing the data that comes out the other side."
Martin won't elaborate on the protein separation and sample feeding technologies that are under consideration, except to say that microfluidic separations, in capillaries or on chips, are in the works. "The big problem," he says, "is going to be data piping: Actually getting the information out of the detector and into some kind of storage device." He's counting on Moore's Law to provide the necessary computer speed two years down the road.
Of course, TOF/TOF's ultimate success is not a given. Lee Hood, godfather of automated genome sequencing, speaks for the skeptics: "While the TOF/TOF looks like a wonderful tool, it still remains to be seen how well it will perform." The promises Martin and Vestal are making, such as 10,000 proteins an hour for a single machine, are so ambitious that some experts doubt they're achievable. "Those numbers are so wild, I'll believe it when I see it," says Mary Lopez, executive vice president for proteomics R&D at Proteome Systems, a company in nearby Woburn, Mass., that is also at work on high-throughput technologies for protein mapping using sensitive MALDI-TOF mass spectrometer from the Japanese company Kratos. Adds Lopez, "I'm skeptical."
Such speed, even if doable, may carry a cost. "Proteomics on steroids" is what Bill Hancock calls the Applied Biosystems/Celera approach. Hancock is vice president and general manager of the proteomics division of Thermo Finnigan, which last month unveiled a new high-resolution, triple quadrupole mass spectrometer. The problem with Celera's muscular approach, he says, "is the faster you go, the more things you look at, the lower the quality of your data."
Martin doesn't concede that data quality will suffer. But TOF/TOF has some weaknesses that could prove fatal to its makers' grand ambitions. In the first place, it contains a potential inner bottleneck. Any tandem mass spectrometer, or MS/MS, can only identify proteins from a complex mixture by submitting them to two different analyses; a single run isn't enough to pick out unique peptides. (Hence, "TOF/TOF.") Instead, a limited number of "precursor" peptide ions are selected by computer based on intensity, mass, and other qualities, split again in a collision chamber, and again thrown towards the detector. The more revealing peptide "peaks" that appear on the resulting spectra can then be compared to spectra generated in silico from known proteins, the sequence determined and the protein identified. But the "precursor ion selection" process takes time. The right peptide ions must be chosen.
"It's roughly a second for each one, and there's a real limitation on how many you can do," says physical chemist Dick Smith, a proteomics researcher at the Pacific Northwest National Laboratory in Richland, Wash. "It takes a lot more time to do this tandem mass spectrometry than it does to take a mass spectrum of the whole mixture." Smith is working on an alternative [see sidebar].
Martin expects TOF/TOF to eventually generate 10 unique spectra per second, but admits this goal won't be easy to reach. It would need to reliably select precursor ions against the background "noise" of thousands of ions from a complex chemical mixture smacking into the detector. If it can't, it'll need a system to thoroughly separate proteins on the front end before they even make it onto the MALDI plate. That could slow down the whole process, or leave behind certain low abundance — but potentially important — proteins. The "molecular scanning" process for 2D gels that Celera plans to use is nowhere close to practical reality.
The overall effect of incomplete or faulty separations, the ion selection bottleneck, and background noise could be a failure to identify low-abundance proteins. (This is known as the "dynamic range" problem.) And because this category includes key proteins like tyrosine kinases, cytokines, and transcription factors, their identification is crucial.
Martin argues that running several separations in parallel, along with the hardware and software improvements now underway, will enable TOF/TOF to detect low-abundance proteins reliably, even at the remarkable speeds projected. And, he adds, it will be sensitive, distinguishing peptides of 2000 Daltons in mass separated by a single Dalton. Smith isn't convinced. He insists, "The MALDI-TOF/TOF approach would have both insufficient throughput because of the need for MS/MS and also insufficient sensitivity and dynamic range."
MALDI-TOF has another problem, based on ion physics. In electrospray ionization, peptide ions injected into the mass spectrometer through a high-voltage needle wind up carrying multiple charges. When these peptides fragment in the collision chamber, it's a "low-energy" process, yielding peptide fragments that are easily identified. In MALDI, on the other hand, the laser places a single charge on peptides. These peptide ions fragment in a "high-energy" process that can confound identification, for large proteins in particular. "Tandem mass spectrometry will be much less effective," says Smith. "It results in ….more limited sensitivity, and less information."
Because it uses a MALDI ion source, TOF/TOF is stuck with this limitation. "It certainly will be a problem," says Smith.
Not so, says Martin, who argues that today's powerful computers, along with the ability to access entire genomes, make these concerns largely moot. But he admits that high energy might be a tough sell. "There's a lot to do to convince the marketplace," he says. "That's going to be done just by generating data and demonstrating how to solve problems."
What TOF/TOF will have, all agree, is speed advantage. An enormous one, if it works as planned. And that advantage might even be worth a tradeoff in data quality. "People are going to be dropping the older techniques and going to whatever will give them the highest throughput, even if it gives them incomplete information," says Andrews. "There's something to say for incomplete data if you have sufficiently high throughput."
Celera's dogma, encompassed in its slogan, "Speed Matters," is based on the premise that getting biological information first is the key to business success. So TOF/TOF seems to be a logical choice — even if its data turn out to be less than perfect.
The speed addicts
Thermo Finnigan's Hancock argues that data quality can't be compromised, and that pursuing the entire human proteome is risky. "Do we want low quality information with proteomics? Absolutely not," he says. "[Celera's] philosophy is, go very fast and characterize as much as you can. Our strategy is to try to get as good quality information as we can, and characterize parts of the proteome at a time." Other companies, like Geneva Proteomics and Proteome Systems, are also pursuing more focused proteomics objectives, analyzing individual organs, tissues, drug effects, or disease states, and will try to balance speed against data quality.
Even if Celera manages to do the entire human proteome accurately and thoroughly, critics argue that the data may not be worth the effort, since it will amount to a snapshot of a constantly changing system. The Celera effort "[is] consistent with the approach they've taken with the genome," says Proteome Systems' Lopez. "The trouble is that the genome is static and the proteome isn't. Not only is this an ambitious thing, but it may be very limited. There may be many different proteomes.
"I'm not saying it's a bad approach," she adds. "[But] it's difficult to transfer a genomics approach to proteomics. It shows some naivete. I know it won't be as simple as they think."
Celera must, in any event, generate data. Lots of it. Craig Venter will be counting on Martin and Vestal's machine, despite its limitations, to make good on his promise of a million protein identifications a day. "If these numbers were named by virtually anyone else, one would say this is probably not realistic," says Ruedi Aebersold, a prominent proteomics researcher at the Institute for Systems Biology in Seattle whose own protein-quantification technology, ICAT, was licensed exclusively by PerSeptive for commercialization this year. "If TOF/TOF throughput is as advertised, and if they put sufficient amount of resources into the project — which they really do have — then I wouldn't discount their claim."
But no one knows if TOF/TOF will give Celera its second triumph, or whether the practical limitations of Marvin Vestal's machine will outweigh its technical virtuosity. Will TOF/TOF prove to be the biological equivalent of the Concorde—speed at all cost? Possibly, like the Concorde, it'll end up as an expensive novelty instead of the industry standard, with potentially dire consequences for Celera. "There are a lot of issues there that haven't been addressed," says Smith. "Certainly, the burden of proof is on them to show it can be done."
Tough TOF/TOF Competition
If it works, Applied Biosystems' TOF/TOF mass spectrometer should be a revolutionary machine. But other, potentially better ones are on the way. One example: a mass spectrometer Dick Smith is developing for the Department of Energy at Pacific Northwest National Laboratory. Instead of TOF or quadrupole technology, Smith chose to work with a Fourier transform ion cyclotron resonance (FT-ICR) mass analyzer. FT-ICR is the highest precision method of all, because ions, which are trapped in a box within a magnetic field, move continuously around in an orbital, or "cyclotron," motion and their signal frequency can be measured again and again. But this usually takes time.
Smith, though, has refined FT-ICR, jacking up the speed. On the machine's front end, he separates proteins in a single, quick run through a capillary tube and feeds their ions directly into the mass spectrometer by electrospray. "The throughput of the approach, how many peptides we can make measurements on in one experiment, is much, much larger than is possible in the MS/MS approach," says Smith, "by a factor of, I would guess, roughly a hundred."
"It's very fast, because it does not necessarily need to fragment the peptides," agrees Ruedi Aebersold, a proteomics inventor at ISB. Unlike TOF/TOF, Smith's machine has no MS/MS, with its ion selection bottleneck, to slow him down. So "his throughput is vastly accelerated," says Aebersold. "[With] fantastic mass accuracy. And it has very, very high sensitivity."
The problem with Smith's mass spectrometer is that it's a homemade machine, built at huge cost. He's working on an improved model, but a commercial version is nowhere near. Smith is now talking with an unspecified instrument company about bringing his instrument to market. "Our hope is that, in a year-and-a-half or two, there will be commercial implementation," he says.