NEW YORK – Goenomics, a bioinformatics spinout from Germany's University of Göttingen, is hoping to upend the world of genome annotation through what it calls "Google for biological sequences."
Customers supply genome sequences to Goenomics and choose which types of genes they want information on, such as protein-coding genes, noncoding RNA genes, and transposons. The company then uses a mix of automated and manual processes to generate a report on these genes of interest.
"In our approach, we come from the biology," CEO and Cofounder Martin Kollmar said. For example, when annotating a novel assembly, "certain proteins have a certain size and [amino acid] content, so we look for that content in the new genome."
That's different from more mathematically driven approaches, which train algorithms with parameters and create models for features such as exon borders. "[The algorithm] then takes the models and goes over the sequence to try and find where the model fits," he said.
This can lead to false positives and negatives, he said. A mathematical approach might identify a transposon as a gene, for example, whereas his firm "actively" identifies transposons and excludes them from reports on protein-coding genes. The difference can lead to approximately 15 percent better annotation accuracy, the firm said.
Kollmar said his firm's services cover an unequaled mix of gene and organism types. In practice, that mostly means annotating genomes for plants and animals, as well as fungi. "We haven't addressed human [genomes] yet," he said. "It's the biggest community in the world, and everyone is improving annotation manually. We couldn't provide a huge benefit." Instead, he's looking to address "all the new genomes" for which there are no annotations.
That strategy has led to perhaps the firm's biggest vote of confidence so far. Last week, Dovetail Genomics announced that it is teaming up with Goenomics to help provide genome annotation as part of Dovetail's genome assembly services, which use Hi-C technology as scaffolding.
"The decision to partner with Goenomics stems from their ability to provide an expanded range of services, a robust and proprietary pipeline, and a high level of value for our customers," a Dovetail spokesperson said in an email.
"We do it much better and much faster than their previous supplier," Kollmar said, adding that he did not know which company Dovetail had dropped for Goenomics.
"Dovetail Genomics has utilized various independent consultants in the past for their annotation services," the company's spokesperson said, declining to disclose them.
The influx of genomes means that Goenomics is currently looking to expand beyond its two cofounders, Kollmar and Chief Technology Officer Dominic Simm. The pair founded the firm in 2021, turning their work on genome annotation software at the University of Göttingen into a spinout (its name is a portmanteau of its hometown and "genomics"). Though the company came out of the university, it has not licensed any technology from it. "There's no real option to protect software," Kollmar said, adding that the firm's code is rewritten every three or four years.
Kollmar is considering raising private financing to help scale the company, which would be a new experience. So far, Goenomics has relied on grants of approximately €1.3 million ($1.4 million) from government agencies to develop its technologies.
It launched with Mendle, its search engine for genome assemblies. "You can type in any organism — plant, animal, or fungus — or search for genome sequence data," Kollmar said. The company has curated genomic data from public sources, such as GenBank, but also "a lot of valuable and useful data that is in Figshare, Dryad repositories, at GitHub, and many other single databases."
Mendle is also the name of the firm's analytics platform for genome annotation. Goenomics offers seven annotation packages. Five are based on genomic features: protein-coding genes, functional annotation, RNA genes, untranslated regions (UTRs), and transposons. Other packages offer annotation on five user-defined genes, as well as a "enhanced analysis" report for the UTR package. Packages are priced based on feature types and genome size — a 15 Gb wheat genome would cost more than a smaller one. For example, an RNA package — one of the "smaller" options — starts at about €750, Kollmar said.
Kollmar noted that Goenomics can work with any supplied genome, including pangenomes and low-coverage or other noisy genomes. Frameshifts or, say, fungal genome data mixed in with a plant genome are also something the firm can handle. It also includes ab initio gene discovery, similar to what Softberry's Fgenesh++ provides.
The ability to annotate novel genomes and cut through contaminated sequences makes ag-bio an important application area, with several ag-bio customers getting annotations through the Dovetail partnership. Excluding them, the firm has fewer than 10 customers, Kollmar said, including academic users. "We're not currently addressing pharma," he said, though he noted that "every animal model needs a genome, and every genome needs an annotation."
Outside of its customer service work, the firm is analyzing some of the approximately 30,000 unannotated genomes it has access to, with the hope that these annotations could be useful someday. The firm is working on a honeybee annotation as a free example for potential customers, as well as annotations for bat genomes and an oak species.
Though the firm doesn't actively offer human genome annotations, "we would if someone orders it," Kollmar said. "Even in the human genome, we don't know the total number of genes, and there are thousands of researchers working with humans. It's worse for every other genome."