Being the self-appointed watchdog for accuracy in biology is a lonely job.
by John F. Lauerman
They still refer to it as “Yeast Hell.” In the spring of 1996, a team at Boston University’s BioMolecular Engineering Research Center used every available computational tool to analyze the world’s first complete, published eukaryotic genome — yeast. Even on the day that thousands of Boston Marathon runners streamed past the lab’s windows, inside Temple Smith pushed his team to keep up its own grueling pace.
“He was like a general leading his troops into war,” recalls Bob Rogers, a staff programmer who has now worked eight years with Smith. “He was definitely in his element.”
Six months later, Smith fired off a letter to Nature pointing out that as many as 500 of yeast’s predicted genes probably weren’t genes at all. The pressure to finish the sequencing work, he says, had led to sloppiness.
“Boy, did we get a lot of nasty letters,” Smith recalls over pizza and a salad at a Kenmore Square eatery.
For more than 30 years, self-described “guerilla scientist” Temple Ferris Smith has maintained his own state of siege on a field to which he has, by all accounts, contributed immeasurably. With a motto of “Trust no one, not even yourself,” he has used a background in physics and mathematics to provide tools for biologists with one hand and to question and goad them with the other. The approach has made him one of the most controversial characters in molecular biology, to put it kindly.
But to biological informatics specialists who protest that their contributions are ignored, Smith is something of a hero. He has consistently published in the best journals, boosted the field’s respectability, and mentored postdocs in an area where grant funding has been hard to find.
“When I told my postdoctoral advisor I was going into this field, he said it was career suicide,” recalls Mark Adams. Today, from his position as senior director of bioinformatics at Variagenics, his years in Smith’s lab look like one of his best decisions.
Not that the BioMolecular Engineering Research Center was a shelter from conflict. Like a cowpuncher on the range, Smith has been known to brand others’ ideas as foolish, right out on the open conference floor. “If you’re going to say someone is wrong, you might as well do it where everyone can see it,” he says.
“There are too many people in the world who are afraid to speak unpleasant truths and willing to hide their heads in the sand,” agrees another former Smith postdoc, Rick Lathrop, now associate professor of computer science and information at the University of California, Irvine. “Temple is willing to call a spade a spade.”
Although widely recognized as a computer jockey, Smith is often first to challenge relying on technology. Once, Adams recalls, Smith became disgusted with how slowly a computer was fitting equations to a set of data points and quickly wrote out an equation himself that worked almost perfectly. “It was like John Henry and the steam hammer,” he marvels. “He had to prove he was better.”
Why does Smith return so often to what may seem like small errors in otherwise sound, productive, exciting research? His simple answer: Mistakes and inconsistencies in gene databases, such as GenBank and the Incyte expression database, both of which Smith was key in creating, form a snowballing problem that obstructs advances in analysis. How, asks Smith, can you search for the cellular signaling G-proteins if there are 18 different terms in use?
Clarifying data,corrupting colleagues
Smith’s new pursuit is to eradicate ambiguity from databases. Rather than using proteins, which often carry out multiple functions in the body, as units of analysis, Smith advocates annotating and searching the more tractable domains that make up proteins. A domain-based annotation system could not only heal ailing databases, but simplify the question of protein-structure prediction.
Last year, he and Kevin Jarrell founded Modular Genetics, a company that will specialize in assembling domains into customized proteins.
Born in Auburn, NY, Smith was a physics student at a small college in northern Michigan who craved the frontier — any frontier would do. He trained briefly and enthusiastically in the early 1970s under Jack Sadler, who was then investigating the activation of the gene for lactose, and periodically conducted summer research at Los Alamos National Laboratory.
Writes Michael Waterman, who was also at Los Alamos, “I was an innocent mathematician until the summer of 1974. It was then that I met Temple Ferris Smith. That experience transformed my research, my life, and perhaps my sanity.” Their intense collaboration led to the Smith-Waterman algorithm, still the most widely used mathematical approach to finding similarity between gene sequences.
Smith admits that he leaned heavily on existing formulas to develop the algorithm. “All I did was add a zero in the right place,” he recalls. In doing so, he allowed for the appearance of gaps in likeness, giving the algorithm the flexibility to align “locally.”
Largely ignored at first, the algorithm gained popularity as the number of known gene sequences surged. After taking a position at the Dana-Farber Cancer Institute in Boston, Smith attracted NIH funding to start a Molecular Biology Computer Research Resource in 1985, which in 1991 became the BioMolecular Research Engineering Center.
Recently, his collaborators and former students celebrated his 60th birthday with a party and T-shirts bearing the Smith-Waterman algorithm on one side and on the other the slogan “Think Globally, Align Locally.” A few days later, Adams received a call from his mentor that told him that nothing had changed.
Adams recalls: “He said, ‘I found an error in your damn shirt!’”