Ben Langmead: New Tools for New Questions
Assistant professor, Johns Hopkins University
Recommended by Steven Salzberg, Johns Hopkins University
Before Ben Langmead went to graduate school at the University of Maryland, he worked at a company where he wrote software that quickly scanned all the network traffic passing through for bad patterns, for strings of code associated with spam or phishing emails.
When he started his graduate studies, he heard from Steven Salzberg about all the data that next-generation DNA sequencers were producing, and Langmead thought that he could apply some of those network-scanning ideas to this genomics problem of generating massive amounts of data.
"I started to work with him and other grad students on this problem, [and] I realized very quickly that those network traffic-scanning ideas were not really the right ones, but that got me hooked anyway," Langmead said. "And I've been in that field since trying to find easier ways to analyze gobs of genomics data."
Now in his own lab at Hopkins, Langmead is taking a three-pronged approach to try to make it easier to understand genomic data.
As a graduate student, he worked on the development of read aligners like Bowtie and Bowtie 2, and he said that those and other aligners sometimes get tripped up and make mistakes, mostly due to the repetitive nature of some sections of the genome. One focus of his lab is to try to estimate how commonly those errors occur.
Another project in the Langmead lab is to scale up common genomics pipelines. If it were easier to run them on a number of datasets at once, it would save researchers time and money, he said.
And Langmead and his team are also interested in making it easier for researchers to utilize archived data. The way that repositories like the sequence read archive and others are set up make it difficult to query their contents. The data, Langmead said, is stored as compressed blocks, and to use it, a researcher would have to download it, de-compress it, and then use a standard tool to try to answer their question.
"If finding something on the Internet meant looking up a list of compressed archives and web pages and trying to find the right one and downloading it and looking through it, no one would ever do that," he said. "But it is very easy because Google knows how to index the entire World Wide Web and expose an interface where they answer queries."
Langmead said that he considers himself a computer scientist, but he works at the intersection of computer science and biology. Straddling those two worlds can be a challenge, he said. "We're sort of on this cusp where we are trying to do two things at once, and it's nobody's first priority but our own. It's hard to navigate that sometimes," he added.
Paper of note
Langmead's first paper, published in Genome Biology in 2009, introduced Bowtie, now a commonly used sequence aligner. When it was published, Bowtie was much faster than the other tools used at the time — in the paper, Langmead and his colleagues said that Bowtie could align 35-basepair reads at a rate of more than 25 million reads per CPU-hour. This, they added, was more than 35 times faster than Maq and 300 times faster than SOAP.
"What was interesting is that none of us had any idea that it would matter or be a hit at all," Langmead said, adding that they soon began hearing from researchers who were using their tool.
"That was probably when it really hit me that making efficient software that is easy to use and well documented was itself a big contribution," he said.
Until recently, sequencing costs were often the most expensive part of any project, and Langmead argued that the current high expense is for collecting samples, particularly human samples. Re-using datasets, he noted, could help rein in that cost and make research more efficient.
"I think going forward, a very interesting area to look at is how to make good use of existing data rather than how to analyze the next dataset that comes off the sequencer," he said.
And the Nobel goes to…
While Langmead said that he likely is not going to win the Nobel Prize, if the committee does surprise him with one, he hopes that he shares it with other researchers who used his computational methods to ask new questions.