Skip to main content
Premium Trial:

Request an Annual Quote

Wrangling with the Data


Mark DePristo
Title: Group Leader, Medical and Population Genetics, Broad Institute of Harvard and MIT
Education: PhD, Cambridge
University, 2004
Recommended by: Richard Durbin

Mark DePristo's path to the Broad Institute was rather circuitous, but it is one he is glad he followed. After graduating with an undergraduate degree in mathematics and computer science, DePristo went to Cambridge University on a British Marshal Fellowship to pursue theoretical computer science. During that time, he caught the computational biology bug and decided to dive headfirst into protein folding and structure modeling — and he found that he couldn't get enough. "It's a common place for people to end up first working when they make their way into biology from computer science or mathematics," DePristo says. Still looking to round out his transition from computer science to a career in biology, DePristo then decided to learn more about how actual biotechnology business works. "I did a short stint at a biotechnology consulting firm called LEK Consulting, but while I was there, friends of mine at the Broad convinced me that I would enjoy working here," he says. "And it just came about that I had a good fit in my background for working on the kind of problems that the population medical genetics program here does."

In his current capacity at the Broad, DePristo leads a team of computational biologists and software engineers working on algorithms for next-generation sequencing data and genetic analysis. Even though De-Pristo's group is relatively new, they have already made waves by delivering what he considers to be the next generation of genome analysis technology. "We have a tool kit that I think is extremely powerful and has vastly improved our productivity, which is called the Genome Analysis Toolkit," he says. "This has been invaluable in both dealing with the high-performance computing issues that we have and also making it easy to ask basic science questions. … I'm very happy with it and it's already been fairly influential within the Broad and outside because of the ease with which it allows you to actually analyze this data."

The major challenge DePristo faces at the Broad is, of course, the storage and analysis of next-gen sequencing data. "The biggest issues we have with the computational side now is just the scale of the kind of projects we envision doing. We want to be doing whole genome sequencing of thousands of people in the next year or two. That involves massive CPU resources and storage resources, so just finding the air conditioners that will cool that kind of installation is really difficult," he says. "The Broad is really running at the limit of things, like how much energy can we pull into the Broad to run these machines, where do we put them, how do we keep them cool? We're going to generate 10 times more data next year, thousands of genomes, so our biggest priority is being able to store the data."

Looking ahead

DePristo says that in the not-so-distant future, he would like to have a general suite of tools for working with next-gen sequencing data that lets researchers mine all the information. "Somebody could ask questions and say, 'I think I'm going to sequence the genomes of 20,000 people' and then just have that be routine enough that, after some period of experimental work, this data would flow through this analysis suite and it would tell you things that were potentially interesting," he says. "We want to solve things like cancer and diabetes, but we are also interested in some basic science questions, such as ancestry, variations in populations, natural selection in human beings."

And the Nobel goes to...

DePristo says if he were to win the Nobel Prize, he hopes that it would be for "technological advancements that would enable many people to make groundbreaking discoveries. ... The best thing would be to be seen as someone who really made it possible for other people to do good science, too."

Filed under

The Scan

NFTs for Genome Sharing

Nature News writes that non-fungible tokens could be a way for people to profit from sharing genomic data.

Wastewater Warning System

Time magazine writes that cities and college campuses are monitoring sewage for SARS-CoV-2, an approach officials hope lasts beyond COVID-19.

Networks to Boost Surveillance

Scientific American writes that new organizations and networks aim to improve the ability of developing countries to conduct SARS-CoV-2 genomic surveillance.

Genome Biology Papers on Gastric Cancer Epimutations, BUTTERFLY, GUNC Tool

In Genome Biology this week: recurrent epigenetic mutations in gastric cancer, correction tool for unique molecular identifier-based assays, and more.