Skip to main content
Premium Trial:

Request an Annual Quote

Matchmaking for Genomes, Automated

Premium

By Bernadette Toner

 

Who says you can’t compare apples and oranges? If you’re talking genomic maps, the exercise has been simplified.

Until now, comparative maps have been compiled by hand, using data collected in wet labs and analyzed with software that can only interpret one map at a time — a process that can take months.

But Susan McCouch, professor of plant breeding at Cornell University, intends to speed things up. Her lab has developed an algorithm that is able to perform the same task in a matter of hours. It draws on natural language processing techniques to remember the labels of genes and make decisions about what sequences go together on the basis of an overall trend.

“This, we think, is the beginning of the replacement of manual alignment,” says McCouch. “The algorithm provides a mechanism by which you can come up with the same comparative solution again and again because you know what the parameters were when you entered into it.”

The algorithm is based on original data on the genomes of rice and maize generated in McCouch’s lab. A comparison of the two genomes using the algorithm closely resembled a similar map made by hand, and revealed a small “footprint” of an ancestral chromosome in maize that did not turn up in a handmade map.

The Cornell researchers also tested the algorithm on a comparison of the mouse and human genomes, based on data available at the Jackson Laboratory website. “It worked beautifully,” says McCouch, “so we are pretty confident that it will work with any pairwise combination where the genetic markers that are used to construct the maps are the same.”

Automated comparative mapping should also help validate single-species maps, McCouch says. Rapid species comparison can reveal inconsistencies that serve as a starting point for a hypothesis that can then be tested in the wet lab. The approach can also be used to compare numerous sets of comparative maps, permitting the comparison of a rice-maize map with a human-mouse map, for example. “No one’s been able to do that before,” McCouch says.

Debra Goldberg, a graduate student in applied mathematics, developed the method in collaboration with McCouch and Jon Kleinberg, assistant professor of computer science. Goldberg presented the work at the Plant and Animal Genome IX conference in San Diego in January.

The team is developing a software application of the algorithm that should be ready within a year.

McCouch says that the software will be released under “copyleft” (as opposed to copyright) status, which would make it freely available, but with a stipulation that any improvements made to the code must credit the original source and be posted publicly. “It’s truly a communal product that we’d like to see remain in the public domain,” she says. Anybody can commercialize the presentation of the algorithm, but they can’t own the algorithm or restrict its public release.

This is fundamental for university-based software development, according to McCouch. Otherwise “you can never get access to it again, so your own students can’t work on improving it.”

The Scan

Not as High as Hoped

The Associated Press says initial results from a trial of CureVac's SARS-CoV-2 vaccine suggests low effectiveness in preventing COVID-19.

Finding Freshwater DNA

A new research project plans to use eDNA sampling to analyze freshwater rivers across the world, the Guardian reports.

Rise in Payments

Kaiser Health News investigates the rise of payments made by medical device companies to surgeons that could be in violation of anti-kickback laws.

Nature Papers Present Ginkgo Biloba Genome Assembly, Collection of Polygenic Indexes, More

In Nature this week: a nearly complete Ginkgo biloba genome assembly, polygenic indexes for dozens of phenotypes, and more.