IBM's life science business unit is tackling a new "grand challenge" — and a new subject area — in an ambitious partnership with the National Geographic Society.
Announced last week, the five-year initiative, called the Genographic Project, aims to collect more than 100,000 DNA samples from indigenous populations around the world in order to map global human migratory history. IBM will provide hardware and database systems to support the project, and researchers at the company's Computational Biology Center will also develop new analytical, statistical, and visualization methods to identify patterns in the sample data in order to reconstruct migratory patterns.
Carol Kovac, industry general manager of IBM Life Sciences, told BioInform that unlike the bulk of IBM's life science activities to date, the Genographic Project has "no relationship to healthcare, human health, or pharmaceutical discovery." This departure is "by design," Kovac said, because "we do believe that, from the perspective of the use of this data, it really has been agreed that this is for an anthropological study and that there is no issue around the idea that we're trying to exploit native populations."
Although it is a large-scale study of human genetic variation, Kovac said that the Genographic project's focus on anthropology sets it apart from the International HapMap project and other population genetics efforts that are rooted in human health.
Kovac likened the project to IBM's BlueGene supercomputer project, another so-called "grand challenge" that the company set for itself when it first entered the life science market in 2000. At the time, "people said, 'Why are you doing this?" Kovac recalled. "And the answer was because it's a big, audacious goal that will drive the world forward in thinking about supercomputing for the life sciences, and it will be ahead of the curve in every sense."
The project, estimated to cost around $40 million, will be led by Spencer Wells, a population geneticist and National Geographic's explorer-in-residence, and it has been funded by the Waitt Family Foundation. It's not clear how much funding the organization or National Geographic are contributing toward the project. Kovac said that Wells and National Geographic first approached IBM with plans for the project about a year and a half ago.
Research teams at 10 centers — in China, Russia, India, Lebanon, the US, Brazil, South Africa, the UK, France, and Australia — will collect and analyze DNA samples from indigenous populations in those regions. Each center will process around 10,000 samples. In addition, the project is seeking participation from the general public by selling $99.95 kits that will allow people to submit their own cheek swab samples.
The project will focus on "known markers of descent" from the Y chromosome and mitochondrial DNA, said Saharon Rosset, a researcher at IBM's Computational Biology Center. IBM researchers plan to tackle a number of statistical and data-analysis challenges associated with the initiative, Rosset said.
While several software tools already exist for phylogeny reconstruction and population genetics analysis, Rosset said that the "resolution" of existing methods could be improved with techniques that the IBM team has already begun developing.
As an example, Rosset said that the public participation side of the project is challenging because researchers will be required to quickly and accurately classify new samples into haplogroups. Because of the large scale of the project, this is "a computational challenge and a statistical challenge," he said. In addition, the IBM team will explore methods for assessing which additional markers may need to be typed in order to resolve ambiguities.
Kovac said that the project will appeal to other researchers at the company's Watson research lab working in areas that are "peripheral to genetics studies," such as natural language processing and linguistics. The Genographic project's focus on human migratory patterns has direct links to cultural evolution, Kovac noted, so IBM researchers working in the area of computational linguistics are expected to participate as the project gathers steam.
Kovac would not provide any details on the project's short-term goals, but said that the company has "already launched work around the database," and that the research teams are in place at the 10 collection centers.
"I expect that we will progress quickly on this," she said.
IBM will also provide the project's principal investigators with IBM Thinkpads "containing biometric fingerprint security that will protect the data gathered in the field," according to a company fact sheet. The company will also provide on-line collaboration tools, as well as blade servers to support the Genographic website (http://www5.nationalgeographic.com/genographic/), which will include "specialized analytical tools to enable real-time access to genetic data."