Cornell University researchers have developed an algorithm that they said should drastically reduce the time it takes to create comparative gene maps.
Such maps are currently compiled by hand, using data collected in wet labs and analyzed with software that can only interpret one map at a time, a process that can take months.
The algorithm is able to perform the same task in a matter of hours, drawing on natural language processing techniques to remember the labels of genes and make decisions about what sequences go together on the basis of an overall trend.
Susan McCouch, professor of plant breeding at Cornell, said: “This, we think, is the beginning of the replacement of manual alignment… The algorithm provides a mechanism by which you can come up with the same comparative solution again and again because you know what the parameters were when you entered into it.”
The algorithm was based on original data on the genomes of rice and maize generated in McCouch’s lab. A comparison of the two genomes using the algorithm closely resembled a similar map made by hand, and revealed a small “footprint” of an ancestral chromosome in maize that did not turn up in the handmade map.
The Cornell researchers also tested the algorithm on a comparison of the mouse and human genomes, based on data available at the Jackson Laboratory website. “It worked beautifully,” said McCouch, “so we are pretty confident that it will work with any pairwise combination where the genetic markers that are used to construct the maps are the same.”
Automated comparative mapping should also help validate single-species maps, according to McCouch. Rapid species comparison can reveal inconsistencies that serve as a starting point for a hypothesis that can then be tested in the wet lab. The approach can also be used to compare numerous sets of comparative maps, permitting the comparison of a rice-maize map with a human-mouse map, for example. “No one’s been able to do that before,” said McCouch.
Debra Goldberg, a graduate student in applied mathematics, developed the method in collaboration with McCouch and Jon Kleinberg, assistant professor of computer science. Goldberg will present the work at the Plant and Animal Genome IX conference in San Diego later this month.
The team is developing a software application of the algorithm that should be ready within a year. McCouch told BioInform that the software would be released under “copyleft” (vs. “copyright”) status, which would make it freely available, but with a stipulation that any improvements made to the code must credit the original source and be posted publicly.
Copyleft carries a restriction that somebody else can commercialize the presentation of the algorithm, but they can’t own the algorithm or restrict its public release. This is fundamental for university-based software development, according to McCouch, who remarked on a common drawback of releasing software into the private sector: “You can never get access to it again, so your own students can’t work on improving it.”
“It’s truly a communal product that we’d like to see remain in the public domain,” she said.