DNAStar has received a $152,000 Phase I Small Business Innovation Research grant from the National Human Genome Research Institute to build a metagenomic sequence assembly and analysis pipeline that will eventually be available to customers of the company's SeqMan NGen software.
The pipeline will be designed to analyze data from next-gen sequencing platforms and will build on existing metagenomics data analysis capabilities that the Madison, Wis.-based company has developed, Tom Schwei, DNAStar's vice president and general manager, told BioInform this week.
These include applications that can “parse out the data from different organisms” at the onset of an analysis project, for example, Schwei said.
Additionally, algorithms such as the company's reference-guided sequence assembly algorithm, which were developed to manage and analyze large datasets on desktop computers, can also be applied to metagenomics problems, he said.
The grant is "really taking our current capability to a new level as far as the magnitude of the types of problems we can solve," Schwei said.
Specifically, during the six-month Phase I grant period, DNAStar aims to determine whether its proprietary non-memory bound assembly engine for reference-guided assemblies, dubbed XNG, can meet the challenges of metagenomic sequencing using NGS data.
The XNG engine was included in the third version of the company's SeqMan NGen released earlier this year. The tool is based on a proprietary algorithm that helps users sort through and compare datasets (BI 02/04/2011).
According to the grant abstract, the company aims to test XNG's ability to remove contaminating DNA from samples; to allocate reads into appropriate phylogenetic bins based on matches to a local reference genome database; and to convert genome sequences from multiple strains of a given species into a single annotated entry or pan-genome.
It is likely that DNAStar will have to develop additional components to address tasks like data organization as well as methods of constructing a pan-genome, Schwei said.
DNAStar expects the metagenomic pipeline to be of interest among scientists interested in analyzing mixed populations and sorting through different types of organisms, Schwei said, particularly because the field lacks good tools for this purpose.
Currently, metagenomics researchers "cobble together various combinations of software tools" to analyze large quantities of next-generation sequence data that’s riddled with reads of different lengths, error models, and unique formats, DNAStar noted in its grant abstract. Furthermore, it added, "most software that can handle these large, complex data sets also requires substantial computing resources and computer expertise beyond that of a normally equipped lab."
Schwei suspects that other groups are attempting to develop similar solutions, but isn't worried about competition.
"We believe that our approach will be superior both with regards to speed and accuracy," he said.
DNAStar is currently working with several undisclosed collaborators who are providing sample datasets and the firm plans to provide a prototype of the pipeline for these users by this fall, Schwei said, though he declined to provide a timeline for its commercial release.
Once available, all SeqMan licensees will have access to the metagenomics pipeline.
Meanwhile, DNAStar will continue to add new features to the pipeline following the initial release.
"The grant ... [is] a phase one, it's not the end of the line," he said.
DNAStar will consider applying for a Phase II SBIR depending on how this development round goes, Schwei said, adding that it would be "premature" to discuss plans for future phases at this time.
However a possible idea that could be considered for future development rounds would be to develop tools for analyzing de novo metagenomes, he said.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com