There are generally two ways to speed whole-genome comparisons: faster algorithms or faster hardware. But Synamatix, a bioinformatics startup based in Kuala Lumpur, Malaysia, says it has developed a new approach to this familiar problem that focuses instead on the database that stores the genomic sequence information.
This database, called SynaBase, sits at the hub of a suite of products that the company is developing to mine the data stored within it. Johan Poole-Johnson, global marketing manager, said the database relies on pattern-recognition technology that assigns “significance” to certain sequence patterns and then stores the relationships between these patterns. Because the database stores each of these patterns only once, in a hierarchical structure, multiple genomes can be added to the system. These genomes would occupy a much smaller amount of space in SynaBase than they would in relational databases or flat files, which retain redundant information for all the raw sequence data across entire genomes, he said.
In addition to the reduced storage requirements, Poole-Johnson said the approach improves upon traditional methods for comparative genomics, which focus on the frequency of certain patterns occurring, rather than their significance. Using the analogy of an English sentence to describe how SynaBase looks at the genetic code, Poole-Johnson said that traditional, similarity-based methods would extract terms like “the,” “an,” and “to,” “because they occur frequently in the language, but they don’t necessarily provide meaning.” SynaBase, by comparison, would be able to hone in on the nouns and verbs that get the speaker’s point across, he said.
SynaBase works in concert with SynaSearch and SynaCompare, the company’s genome comparison and visualization tools, to quickly align long genome sequences, and even entire genomes. Synamatix said it has aligned two bacterial genomes in less than 24 seconds, and the human X and Y chromosomes in under 2.5 minutes. A self-comparison of the whole human genome that took 4.5 days to run on a single CPU with Mummer would take only 44 hours using SynaBase, according to the company.
Henk Heus, vice president of R&D at Gene-IT and an expert in comparative genomics, said that a new architecture for storing sequences “is a good idea — the result is performance and new science that is not achievable with traditional approaches.” However, Heus noted, from the information that the company has made public about its technology, it’s difficult to assess its accuracy, or how well it is able to “coax syntax and meaning out of the patterns.”
According to Poole-Johnson, one demonstration of the company’s technology that proves its pattern-recognition capabilities involves removing all the spaces between the words in a book, and then feeding the book into SynaBase. Once the database identifies the patterns and characters in the book, “we can output those words and their spaces as identical as they were before.”
The 15-person company was founded in 2001 by Robert Hercus, the inventor of the pattern-recognition technology that lies at the heart of SynaBase. Hercus has formed several other firms around the same core algorithms, but Synamatix is the first of these to commercialize its technology, Poole-Johnson said. Funded by “a combination of venture capital and angel funding,” Poole-Johnson said that the company has also received a $700,000 grant from the government of Malaysia to develop and commercialize its technology.
The company’s product suite will eventually contain at least six additional applications, including SynaMine for sequence analysis, SynaStruct for 2D structural information, and SynaPath for pathway data.