Skip to main content
Premium Trial:

Request an Annual Quote

Ben Brown: Mathematical Rigor for ENCODE and ModENCODE

Premium

Recommended by: Steve Brenner, University of California, Berkeley

"I started out life as a mathematician," says Ben Brown, a computational biologist at the University of California, Berkeley. Working on an undergraduate degree in pure mathematics, "I didn't see a number or a computer for four years. I was covered in chalk dust all the time, and did a bunch of stuff that was very enjoyable, but realized by the end of it that it was not going to be applicable to anything in the real world any time soon."

For his graduate work, Brown sought "to move a little bit closer to the front lines for the actual betterment of society," and in 2005 landed a postdoc position in Peter Bickel's lab at Stanford, where he has worked ever since — primarily on statistical analysis for the ENCODE and ModENCODE projects.

Bickel, one of the founders of nonparametric statistics, has had a tremendous influence on Brown's research. "The idea behind nonparametric models is that you make as few assumptions as you possibly can about the data, and then when you've assumed almost nothing whatsoever, what can you still infer?" he says. "That is something that really resonates with me, from my purist mathematics upbringing — trying to understand data, be it biological or anything else, with as little prior sentiment as possible."

Looking ahead

This approach has served Brown well over the last several years as he's sifted through the massive ENCODE datasets, earning him a co-authorship on the consortium's main paper published in Nature earlier this year.

But these skills will likely be more applicable to some projects Brown has on the horizon.

In one project, Brown will work with Bickel's team to develop better methods of dimension reduction for ENCODE data. This effort grew out of a desire within the consortium to be able to associate, for each base on the human genome, the information that has been added by the ENCODE project.

"That sounds great, but the problem is that the dependent structure in the data is incredibly complex," Brown says. As an example, a single DNA binding factor was analyzed in 36 cell lines by one lab alone, and all of these measurements are highly dependent on each other. "If you had 3,000 independent measurements at every base, that would be trivial" to represent, he says. "But you don't have that. You have this very high dimensional dependence."

In another project, Brown is focusing on improving nonparametric statistical methods for functional genomic analysis. And he's also working with outside collaborators to apply these statistical tools in the translational research setting, including projects studying chronic obstructive pulmonary disease and epilepsy.

And the Nobel goes to…

Brown says that if he were to win the Nobel Prize, he'd like it to be for contributions toward "a cohesive and quantitative picture of gene regulation in an animal system."

Specifically, he adds, while an "enormous amount of time and money has been invested in understanding the first half of molecular biology's central dogma of "DNA codes for RNA, which codes for protein," a complete understanding of gene expression will require an understanding of "not just one linkage in the trafficking of genomic information, but the entire chain."

Filed under