Skip to main content
Premium Trial:

Request an Annual Quote

Michael Hoffman: Noisy Genomic Data, Made Clearer


Title: Senior fellow, University of Washington
Education: PhD, University of Cambridge, 2008
Recommended by: Francis Collins, National Institutes of Health

Michael Hoffman was a plant biochemist doing immunohistochemistry in Karen Browning's lab at the University of Texas at Austin when the Arabidopsis genome was published. Now a senior fellow in Bill Noble's group at the University of Washington, Hoffman is applying machine-learning techniques to genomic data sets generated as part of the National Human Genome Research Institute's ENCODE project.

As part of his work with ENCODE, Hoffman is refining a computational tool called Segway, "which does a simultaneous segmentation and clustering of functional genomics data" from ChIP-seq and other experiments, he says, "and tries to find patterns" within multiple data tracks. "It's really an attempt to see what you can find if you throw everything at a computer," he adds.

Hoffman's transition from biochemistry to computational biology, though not exactly simple, was facilitated in large part by the support of Ewan Birney, his PhD supervisor at the University of Cambridge. Birney "really encourages a lot of creativity in how you look at genomic problems," Hoffman says. That emphasis has paid off, especially as he's now applying electrical engineering and computer science techniques to analyze noisy biological data, he adds.

According to Hoffman, the current genome-wide screening technologies generate a fair amount of uncertainty. "You [can] sequence a few hundred base pairs … but it only tells you so much because you're excising that sequence from its neighborhood," he says. The more researchers understand about how the genome is regulated, the larger the role its neighborhood appears to play, he says. "Potentially, interactions with the other arm of the chromosome, or some other chromosome," could be functionally important.

Exacerbating the issue, Hoffman adds, is the fact that the majority of machine-learning techniques he uses were originally developed for natural language processing. "Computer scientists had a big advantage when developing these — they knew what the right answer was," he says. "A problem in genomics is that you never really know what the right answer is. The best you can hope to do is compare [your results] against what you already know."

Looking ahead

For Hoffman, finding the right answers is a race against time. He expects that over the next five to 10 years, the inundation of genomic information will only get worse. Wet lab researchers ought to learn how to perform their own computational analyses, "otherwise there will be a tremendous backlog of data. ... Right now I'm not sure there are enough bioinformatics geeks to go around," Hoffman says.

Publications of note

Hoffman says his best work to date appeared in Genome Research in 2010. In a paper he co-authored with Birney, Hoffman describes Sunflower, a package that models transcription factor binding and provides an "interesting ... look at the selective pressure that may have caused" binding competitions in the human genome in the past.

And the Nobel goes to ..

If the Nobel Prize committee chose to honor Hoffman, he hopes it'd be for teaching "a computer to understand genomic regulation with the same degree of accuracy as we can understand speech. If a computer could predict how genes were going to be regulated, or how a developmental program is organized, with [a high] level of accuracy, that would be really fantastic."

Filed under

The Scan

Booster Push

New data shows a decline in SARS-CoV-2 vaccine efficacy over time, which the New York Times says Pfizer is using to argue its case for a booster, even as the lower efficacy remains high.

With Help from Mr. Fluffington, PurrhD

Cats could make good study animals for genetic research, the University of Missouri's Leslie Lyons tells the Atlantic.

Man Charged With Threatening to Harm Fauci, Collins

The Hill reports that Thomas Patrick Connally, Jr., was charged with making threats against federal officials.

Nature Papers Present Approach to Find Natural Products, Method to ID Cancer Driver Mutations, More

In Nature this week: combination of cryogenic electron microscopy with genome mining helps uncover natural products, driver mutations in cancer, and more.