Diatom Software, a bioinformatics consultancy based in Holliston, Mass., is hoping to make a name for itself by offering software development services to pharmaceutical, biotech, and healthcare firms looking to outsource their bioinformatics needs.
The four-person company, which opened its doors in 2010, develops scientific software applications, tools, and databases for use by both researchers and bioinformatics specialists. Specifically, it develops solutions to help users integrate biological data from both public and proprietary sources; make connections in their data; and handle biological nomenclature standards as well as use cloud technology.
As part of those efforts, the group has developed a portfolio of different tools that it uses in its projects. These solutions are not available for sale but Diatom is considering commercializing some of its offerings “as different companies take advantage of [them] and as we better understand some of the problems that arise” in the bioinformatics space, John Rachlin, Diatom’s co-founder and chief scientific officer, told BioInform this week.
If the company decides to start selling software, two potentially “viable” candidates for the market are its BioLink and CloudPCR tools, Rachlin said.
CloudPCR, designed for developing multiplex PCR-based assays, was developed as part of a consulting project Diatom did for Life Technologies’ Ion Torrent business, which aimed “to develop protocols to improve the multiplexing of the sample preparation stage,” Rachlin said.
That project focused on developing cloud-based algorithms to design multiplex PCR experiments that could amplify exonic DNA in samples that are being prepped for sequencing on the Ion Torrent PGM, he said.
Designing these experiments in silico is “an incredibly hard computational problem because when you are trying to do multiplex PCR, there are all these problems with the formation of [things like] primer dimers,” he explained.
CloudPCR is built on a generic cloud optimization framework developed by Diatom that’s used for “solving any kind of multi-objective decision support problem” where “there is no single best algorithm or solution,” Rachlin explained. This infrastructure and algorithms were used to sift through and compare candidate experimental designs in the Ion Torrent project, he said.
“What the cloud optimization framework is doing basically is allowing us to ... integrate different problem-generating solvers into a common framework, to distribute those solvers across different servers running on Amazon's EC2 cloud, and then generating many different solutions that allow users to see the tradeoffs,” he said.
A second tool, BioLink, is a framework for constructing integrated biological databases that “allow scientists to ask deeper questions,” Rachlin said. It provides a solution for integrating, querying, and analyzing data and links both public and proprietary data sources.
A researcher could, for example, use BioLink to find information about which genes are associated with a particular disease pathway and, further, which of those genes are targets for biochemical assays and which compounds are active against those genes, he said.
The tool uses a number of scripts to pull in and parse information from public resources such as PubChem and ClinicalTrials.gov, among others, but the Diatom team can include data from other public repositories as needed, Rachlin said.
BioLink also “enables us to define certain standards about how the data is going to be rendered in our database as well as a variety of user interface components that enable us to build a nice standardized system for exploring and visualizing that data,” Rachlin said.
A final tool in Diatom’s kit is Transom, which is a “nomenclature translation” tool for managing biological identifiers, names, synonyms, and mappings from different database sources, Rachlin said.
The tool lets users manage synonyms and look up translations quickly, he said. It also provides programmatic application programming interfaces that support companies’ internal development projects.
For example, if a researcher has 10,000 Entrez gene IDs and would like to know the UniProt accession numbers, Transom provides a web services API that would allow users to submit the list and it then maps the genes to the corresponding UniProt numbers and provides the results in “a few seconds,” he said.
In addition to its work with Life Tech, Diatom has worked in the past on the Computational Bridge to Experiments, or COMBREX, project, which is focused on generating and validating gene function predictions in prokaryotes (See related story this issue).
For the project, Diatom developed infrastructure to manage users, predictors, and experimentalists as well as a mechanism that interested experimentalists could use to submit grant proposals to the project to test particular gene predictions, Rachlin said.
Currently, the firm is working with the US Centers for Disease Control and Prevention in collaboration with Epidemico, a public health firm, to redesign and re-implement components of the CDC’s National Electronic Disease Surveillance System in order to improve the tool’s message validation, storage, analysis, and reporting.
Pricing for Diatom's services vary from customer to customer but Rachlin said that the group’s rates are “fairly cost competitive” compared to what a company might pay if it needed to hire a bioinformatics team on a permanent basis.
Rachlin declined to name any specific competitors for Diatom, stating that other bioinformatics consulting companies such as Eagle Genomics and BioTeam are actually potential partners.
Currently, the company is trying to secure additional contracts and if it succeeds, it will hire more Java developers with bioinformatics backgrounds.
Diatom expects the bulk of its business to come from small startups and midsized companies who lack in-house bioinformatics staff and expertise, although it expects to get some business from larger groups like the CDC who are looking to outsource some of their bioinformatics needs.
Diatom also intends to retain its consultancy model even after it begins to sell licenses to its software, Rachlin said.
With “a lot of these technologies …there are always nuances and customizations that are required and I think companies would benefit from enabling us to work with them,” he said.
For example, in the area of data integration Diatom could help customers identify candidate data sources to be integrated with their proprietary data, he said.
Additionally, there will still be a market for providing consulting and solutions to customers that lack internal bioinformatics capabilities or those who require informatics expertise on a short-term basis, he said.