Skip to main content
Premium Trial:

Request an Annual Quote

Michael Hoffman: Noisy Genomic Data, Made Clearer


Title: Senior fellow, University of Washington
Education: PhD, University of Cambridge, 2008
Recommended by: Francis Collins, National Institutes of Health

Michael Hoffman was a plant biochemist doing immunohistochemistry in Karen Browning's lab at the University of Texas at Austin when the Arabidopsis genome was published. Now a senior fellow in Bill Noble's group at the University of Washington, Hoffman is applying machine-learning techniques to genomic data sets generated as part of the National Human Genome Research Institute's ENCODE project.

As part of his work with ENCODE, Hoffman is refining a computational tool called Segway, "which does a simultaneous segmentation and clustering of functional genomics data" from ChIP-seq and other experiments, he says, "and tries to find patterns" within multiple data tracks. "It's really an attempt to see what you can find if you throw everything at a computer," he adds.

Hoffman's transition from biochemistry to computational biology, though not exactly simple, was facilitated in large part by the support of Ewan Birney, his PhD supervisor at the University of Cambridge. Birney "really encourages a lot of creativity in how you look at genomic problems," Hoffman says. That emphasis has paid off, especially as he's now applying electrical engineering and computer science techniques to analyze noisy biological data, he adds.

According to Hoffman, the current genome-wide screening technologies generate a fair amount of uncertainty. "You [can] sequence a few hundred base pairs … but it only tells you so much because you're excising that sequence from its neighborhood," he says. The more researchers understand about how the genome is regulated, the larger the role its neighborhood appears to play, he says. "Potentially, interactions with the other arm of the chromosome, or some other chromosome," could be functionally important.

Exacerbating the issue, Hoffman adds, is the fact that the majority of machine-learning techniques he uses were originally developed for natural language processing. "Computer scientists had a big advantage when developing these — they knew what the right answer was," he says. "A problem in genomics is that you never really know what the right answer is. The best you can hope to do is compare [your results] against what you already know."

Looking ahead

For Hoffman, finding the right answers is a race against time. He expects that over the next five to 10 years, the inundation of genomic information will only get worse. Wet lab researchers ought to learn how to perform their own computational analyses, "otherwise there will be a tremendous backlog of data. ... Right now I'm not sure there are enough bioinformatics geeks to go around," Hoffman says.

Publications of note

Hoffman says his best work to date appeared in Genome Research in 2010. In a paper he co-authored with Birney, Hoffman describes Sunflower, a package that models transcription factor binding and provides an "interesting ... look at the selective pressure that may have caused" binding competitions in the human genome in the past.

And the Nobel goes to ..

If the Nobel Prize committee chose to honor Hoffman, he hopes it'd be for teaching "a computer to understand genomic regulation with the same degree of accuracy as we can understand speech. If a computer could predict how genes were going to be regulated, or how a developmental program is organized, with [a high] level of accuracy, that would be really fantastic."

Filed under

The Scan

Genetic Risk Factors for Hypertension Can Help Identify Those at Risk for Cardiovascular Disease

Genetically predicted high blood pressure risk is also associated with increased cardiovascular disease risk, a new JAMA Cardiology study says.

Circulating Tumor DNA Linked to Post-Treatment Relapse in Breast Cancer

Post-treatment detection of circulating tumor DNA may identify breast cancer patients who are more likely to relapse, a new JCO Precision Oncology study finds.

Genetics Influence Level of Depression Tied to Trauma Exposure, Study Finds

Researchers examine the interplay of trauma, genetics, and major depressive disorder in JAMA Psychiatry.

UCLA Team Reports Cost-Effective Liquid Biopsy Approach for Cancer Detection

The researchers report in Nature Communications that their liquid biopsy approach has high specificity in detecting all- and early-stage cancers.