By Bernadette Toner
Who says you can’t compare apples and oranges? If you’re talking genomic maps, the exercise has been simplified.
Until now, comparative maps have been compiled by hand, using data collected in wet labs and analyzed with software that can only interpret one map at a time — a process that can take months.
But Susan McCouch, professor of plant breeding at Cornell University, intends to speed things up. Her lab has developed an algorithm that is able to perform the same task in a matter of hours. It draws on natural language processing techniques to remember the labels of genes and make decisions about what sequences go together on the basis of an overall trend.
“This, we think, is the beginning of the replacement of manual alignment,” says McCouch. “The algorithm provides a mechanism by which you can come up with the same comparative solution again and again because you know what the parameters were when you entered into it.”
The algorithm is based on original data on the genomes of rice and maize generated in McCouch’s lab. A comparison of the two genomes using the algorithm closely resembled a similar map made by hand, and revealed a small “footprint” of an ancestral chromosome in maize that did not turn up in a handmade map.
The Cornell researchers also tested the algorithm on a comparison of the mouse and human genomes, based on data available at the Jackson Laboratory website. “It worked beautifully,” says McCouch, “so we are pretty confident that it will work with any pairwise combination where the genetic markers that are used to construct the maps are the same.”
Automated comparative mapping should also help validate single-species maps, McCouch says. Rapid species comparison can reveal inconsistencies that serve as a starting point for a hypothesis that can then be tested in the wet lab. The approach can also be used to compare numerous sets of comparative maps, permitting the comparison of a rice-maize map with a human-mouse map, for example. “No one’s been able to do that before,” McCouch says.
Debra Goldberg, a graduate student in applied mathematics, developed the method in collaboration with McCouch and Jon Kleinberg, assistant professor of computer science. Goldberg presented the work at the Plant and Animal Genome IX conference in San Diego in January.
The team is developing a software application of the algorithm that should be ready within a year.
McCouch says that the software will be released under “copyleft” (as opposed to copyright) status, which would make it freely available, but with a stipulation that any improvements made to the code must credit the original source and be posted publicly. “It’s truly a communal product that we’d like to see remain in the public domain,” she says. Anybody can commercialize the presentation of the algorithm, but they can’t own the algorithm or restrict its public release.
This is fundamental for university-based software development, according to McCouch. Otherwise “you can never get access to it again, so your own students can’t work on improving it.”