Skip to main content
Premium Trial:

Request an Annual Quote

Noblis Readies Internal Pathogen Identification Software for Commercialization


NEW YORK (GenomeWeb) – This summer, Falls Church, Virginia-based Noblis plans to commercialize its proprietary platform for analyzing and detecting variations in whole-genome sequence from microbial and human samples for various public health applications.

Noblis is a not-for-profit science and technology research firm with over 1,200 employees that works largely with the US federal government and academia on projects in the areas of national security, health innovation, enterprise engineering, energy and environment, intelligence, and transportation.

The company's efforts on the genomics front, under a few of those categories, revolve around exploring large and complex genomic datasets in the context of infectious diseases studies, food-borne pathogen identification, and more. Researchers at the company began developing BioVelocity in 20ll to provide an efficient and rapid tool for exploring large metagenomics datasets, initially, than was currently available, Sterling Thomas, director, Data Analytics Center of Excellence, National Security & Intelligence, at Noblis told GenomeWeb.

Based on read mapping technology, BioVelocity maps input sequences to a large indexed database of reference genome libraries — customized according to the samples being analyzed — and scores alignments based on similarities between sequences to identify species present in the samples and their respective percentages in the sample, Thomas explained. The software also identifies variations such as SNPs and insertions and deletions to provide insights into features such as the evolutionary pressures on the pathogen and the speed of mutation. BioVelocity's reference libraries draw on datasets primarily from standard open-source repositories such as those maintained by the National Center for Biotechnology Information, Thomas said, but they also draw on data from previous projects with government and academia. In addition, the company is reaching out to industry to try to obtain data on food-borne pathogens that it can add to the system.

BioVelocity can generate a single high-quality assembly that includes annotations from all the reference genomes that the sample was compared to, or it can provide several lower-quality assemblies that show all the different variations between the sample and the reference genomes, Thomas said. For a metagenomics sample, the software returns a list of all the different species and strains that are present within the datasets and also offers statistics about the number and coverage of the reads. It also includes a confidence score that's associated with how well the sample sequences aligned to the reference genome. The software was initially designed to run metagenomics samples, but it can also be used to analyze human samples as well.

Noblis claims that based on internal benchmarks its software generates results up to 50 times faster than standard short-read mapping methods like Bowtie. According to numbers provided by the company on its site, BioVelocity is able to align reads and identify SNPs in a sample in about 12 minutes. The software was also able to analyze 27 million metagenomics sequence reads and identify the microbial species in the sample in just 37 seconds.

The secret to the software's speed is in part due to the indexing technology used to create the reference database, which makes it really fast to search, but the software performance is also boosted by its ability to parallelize tasks over multiple computing threads. Within Noblis, the software runs on a four terabyte Cray supercomputing system owed by the company, which provides ample space and compute for holding the reference database in memory as well as for running jobs in parallel. But the developers have also adapted the software to perform equally well on commodity hardware by utilizing a few shortcuts that make it possible to temporarily write parts of the database to disk, thereby reducing the memory footprint, Thomas explained.

Since it was developed, Noblis has run BioVelocity on internal infrastructure providing it as a service for research projects with the federal government and academia focused on infectious diseases and other areas. The software was used, for example, in a study, described in a paper published in Viruses, that focused on the 2014 Ebola virus and how changes in the genomic profiles of the virus could affect the sensitivity of existing diagnostic assays. 

But the costs of maintaining and improving software aren't trivial, and commercializing the software will enable the not-for-profit firm to support BioVelocity's continued development, Thomas said. A commercial offering would also broaden the scope of potential users that would have access to the software and would be able to offer feedback that could be used to further improve the solution.

Thomas told GenomeWeb that the company is currently considering various mechanisms for making the software available on the cloud. Initially Noblis will offer web-based access to software running on its own internal supercomputing infrastructure. Over time, it will move customers to its own private cloud with the intent of possibly moving the tool out to a commercial cloud vendor at a later date, Thomas said. The company has not selected a vendor at this time.

Besides the cloud version, Noblis is also working with hardware vendors to possibly release a BioVelocity appliance by the end of the summer, Thomas said. The company is also still discussing an appropriate pricing strategy for a commercial version of BioVelocity.

The planned software will offer the same capabilities as Noblis' internal system but will feature a more user-friendly interface, as opposed to the command line interface it uses internally. The company has given some of its government and academic partners access to the commercial tool and the opportunity to test it for themselves, but a lot still needs to be done in terms of getting different kinds of users to test the system, Thomas said. "I'm anticipating that once this rolls out into the market there'll be a series of updates that improve the user interface and those types of things."

When BioVelocity goes to market, it will have to compete with offerings from companies such as OneCodex, a San-Francisco-based startup that is working to commercialize a similar system for identifying pathogens from NGS data. What sets BioVelocity apart from the competition, according to Thomas, is its ability to build a reference database tailored to the analyses in question in a matter of minutes. In contrast, other systems can take days to build up the index needed for the alignment step.

Another advantage of the system is that as each sequence in an input sample is analyzed, users can add them to the reference database and compare them to other sequences within the same sample, he added.

Noblis is also working on a second genomics-centric application focused on data transfer, Thomas said, but he declined to provide details about that particular offering at this time.