Title: Assistant Professor, Harvard Medical School
Education: PhD, Yale University, 2004
Recommended by: Isaac Kohane
There's a lot of information floating around on the Web, and John Brownstein is working on a way to harness it to study disease patterns. He started out as an epidemiologist studying infectious diseases; for his PhD, he gathered data the old-fashioned way — in the field. In public health, he quickly found, there is a paucity of data. "I moved into the informatics area where there's new ways of capturing huge, large data sets — almost like drinking out of a fire hose of data," he says. "A lot of my work now is [on] how do we treat these large data sets, whether it's clinical data systems or more informal data ... and how do we take that information and say something useful about population health?"
Brownstein's main project is HealthMap.org. This tool crawls tens of thousands of websites in seven languages every hour, he says. It's on the lookout for information related to public health, such as ongoing disease outbreaks. That data, after being processed, filtered, and validated, is made available to public health agencies and the general population on the website as a map. "We take it, we process it, we filter it for noise; we also combine aggregate data that is about the same event and then we make that available in a visualization frame or platform — for instance, the map," he says. Brownstein also released an iPhone app, called Outbreaks Near Me, that allows users to view real-time outbreak information and report what they see.
It's a lot of data and Brownstein says that managing the information is a challenge. "How do you manage large data sets? We are looking for needles in haystacks," he says. "The data, when huge in volume, has a huge amount of noise as well, so how do you find those needles in the haystack? How do you reduce false alarms in the data? How do you integrate different data streams, disparate information resources into one common picture?"
For instance, when Brownstein and his colleagues looked back through their data, they could see that they had picked up the H1N1 outbreak early on in Mexico. However, it looked like a lot of other events that were going on at that time. "How do we distinguish that event from everything else and really make a judgment that someone should have intervened and potentially have stopped the spread of swine flu?" he asks.
While Web 2.0 tools such as Twitter represent a large source of data — and Brownstein says there will be even more opportunities emerging in that space — he thinks that eventually he may be able to move away from these informal data sources. "If public health infrastructure and electronic medical records get to the point where there's value, we may not even need to go to these informal, less clean data sources. That would be ideal," he says.
Publications of note
In a perspective article that appeared in the New England Journal of Medicine last May, Brownstein and his colleagues introduced their surveillance method. "HealthMap is an openly available public health intelligence system that uses data from disparate sources to produce a global view of ongoing infectious disease threats. It has between 1,000 and 150,000 users per day, including public health officials, clinicians, and international travelers," they write. And in a related article in the same issue, they describe how their tool picked up the early stages of the H1N1 outbreak.
And the Nobel goes to...
If Brownstein were to be awoken by a call from the Nobel committee, he says that "if it's for stopping the next pandemic, the next H1N1, that would be great."