CHICAGO (GenomeWeb) – Yesterday, ahead of next week's American Society of Human Genetics annual conference, startup sequencing analysis software developer Parabricks announced that it had won a Phase II Small Business Innovation Research grant from the National Science Foundation. The grant, worth just shy of $750,000, is meant to advance R&D on secondary genomic analysis.
Along with the SBIR grant, Ann Arbor, Michigan-based Parabricks also said this week that it has received $125,000 from the Michigan Emerging Technologies Fund, a network supported by the US Small Business Administration and by public and private interests in the state. Parabricks will apply that money to help commercialize its technology.
Next week at ASHG, the company plans to announce a new software product, but details are being withheld until then.
Phase I of the SBIR grant resulted in Parabricks teaming with DDN Storage to create an integrated graphics processing unit-based software and storage platform to speed up human genome analysis. The partners said that the offering could analyze 1,500 genomes per week, accelerating precision medicine workflows by 100x.
"With that, we started making some parts of our products, including a germline pipeline and a somatic pipeline," said Parabricks CEO and Cofounder Mehrzad Samadi.
In the new phase, Parabricks will build out its software in pursuit of an end-to-end platform for researchers and clinicians alike that the company believes will cut NGS secondary data analysis processing time to less than an hour using the Broad Institute's GATK4 pipeline.
"With Phase II, we are going to do population studies, joint genotyping, and deep learning-based variant calling," Samadi explained.
Samadi said there are two key components to population studies: finding variants in a single sample, then merging those variants in a database to find common patterns among populations.
"For example, if you are studying diabetes, you sequence 1,000 people with diabetes and find their individual variants first and combine these variants to find common variants," he explained.
The Phase II work also will include machine learning and artificial intelligence. "We will try to incorporate a lot of the machine learning and deep learning into [our pipelines]," said the other Parabricks cofounder, Ankit Sethia, who now serves as the company's technical leader.
"One of the most exciting things for us going forward is the application of deep learning and AI in genomics," Sethia added. "We are doing the secondary analysis anyway. If we incorporate more of the deep learning and AI part, then we will have a really groundbreaking platform where people can do their computational analysis, can do their deep learning and AI, and can get everything done faster."
The speed comes from GPUs and related hardware supplied by partners including Nvidia, Dell, IBM, and Hewlett Packard Enterprise. With Parabricks' software, a single GPU server can replace 50 to 100 traditional servers, according to Samadi. For big projects like population studies, when they are going after hundreds of thousands of genomes, we can do the same computations with far fewer servers," he said.
The AI also builds upon work done by third parties. Sethia said that Parabricks is adding components to the source code of Google's DeepVariant to adapt that variant caller for GPU use.
"DeepVariant has really, really good accuracy results," Sethia said. Parabricks is accelerating it. "We are re-engineering it to become really, really high-performance," he explained. "Once it is done, you [will be able to] run a secondary analysis using DeepVariant with our software suite much, much faster."
Samadi and Sethia met in the University of Michigan PhD program in computer science and engineering nearly a decade ago. They started the company in 2015 after acquiring expertise with graphics accelerators.
"We were starting to think that all the research we had done during our PhD and during all our research life, where can we apply it to make a significant impact?" Sethia said.
"We realized that computational genomics is a very good space because of all the precision medicine initiatives, all the population studies," he continued. "By bringing down the computing time, we're actually helping biology and medicine, which is something we could not have done in our research life," Sethia said.
"We found that genomics markets need this type of expertise because of [the presence of] lots and lots of data, and processing is taking a long time," Samadi said.
Their pre-Parabricks work involved looking for ways to accelerate data processing, typically by factors of 10x to 100x. "We realized that what we had developed could be used for end-to-end applications in many domains," Sethia said. "We zoomed in on one of the domains, which was computational genomics, and we made an end-to-end application for it."
Samadi suggested that the Phase I product was "just the tip of the iceberg." He said that every customer so far has not only remained loyal, but has come back to Parabricks with a list of software that takes a long time to run and thus could benefit from GPU-based acceleration.
"We are going one after another to provide full, complete software suites for everyone so they don't need to think about how long it takes them to do the computation. They just focus on their research," Samadi said.
While Parabricks now is focused on genomics, Samadi said that he is thinking about expanding into protein folding and multi-omic simulation. "You have this complete framework so you can run everything on the GPU," he said.
Parabricks now has eight full-time employees and two part-timers. Samadi wants the head count to reach about 15 by mid-2019.
Early funding came from public-private investment funds in the Detroit area and for the state of Michigan. In addition to the grants, the company is generating revenue now from commercial sales, according to Samadi. He said that he is looking into a Series A venture capital round next year.