Title: Senior fellow, University of Washington
Education: PhD, University of Cambridge, 2008
Recommended by: Francis Collins, National Institutes of Health
Michael Hoffman was a plant biochemist doing immunohistochemistry in Karen Browning's lab at the University of Texas at Austin when the Arabidopsis genome was published. Now a senior fellow in Bill Noble's group at the University of Washington, Hoffman is applying machine-learning techniques to genomic data sets generated as part of the National Human Genome Research Institute's ENCODE project.
As part of his work with ENCODE, Hoffman is refining a computational tool called Segway, "which does a simultaneous segmentation and clustering of functional genomics data" from ChIP-seq and other experiments, he says, "and tries to find patterns" within multiple data tracks. "It's really an attempt to see what you can find if you throw everything at a computer," he adds.
Hoffman's transition from biochemistry to computational biology, though not exactly simple, was facilitated in large part by the support of Ewan Birney, his PhD supervisor at the University of Cambridge. Birney "really encourages a lot of creativity in how you look at genomic problems," Hoffman says. That emphasis has paid off, especially as he's now applying electrical engineering and computer science techniques to analyze noisy biological data, he adds.
According to Hoffman, the current genome-wide screening technologies generate a fair amount of uncertainty. "You [can] sequence a few hundred base pairs … but it only tells you so much because you're excising that sequence from its neighborhood," he says. The more researchers understand about how the genome is regulated, the larger the role its neighborhood appears to play, he says. "Potentially, interactions with the other arm of the chromosome, or some other chromosome," could be functionally important.
Exacerbating the issue, Hoffman adds, is the fact that the majority of machine-learning techniques he uses were originally developed for natural language processing. "Computer scientists had a big advantage when developing these — they knew what the right answer was," he says. "A problem in genomics is that you never really know what the right answer is. The best you can hope to do is compare [your results] against what you already know."
For Hoffman, finding the right answers is a race against time. He expects that over the next five to 10 years, the inundation of genomic information will only get worse. Wet lab researchers ought to learn how to perform their own computational analyses, "otherwise there will be a tremendous backlog of data. ... Right now I'm not sure there are enough bioinformatics geeks to go around," Hoffman says.
Publications of note
Hoffman says his best work to date appeared in Genome Research in 2010. In a paper he co-authored with Birney, Hoffman describes Sunflower, a package that models transcription factor binding and provides an "interesting ... look at the selective pressure that may have caused" binding competitions in the human genome in the past.
And the Nobel goes to ..
If the Nobel Prize committee chose to honor Hoffman, he hopes it'd be for teaching "a computer to understand genomic regulation with the same degree of accuracy as we can understand speech. If a computer could predict how genes were going to be regulated, or how a developmental program is organized, with [a high] level of accuracy, that would be really fantastic."