BOSTON — Bina Technologies is hoping to make its mark in the next-generation sequence data analysis space by offering a complete data analysis pipeline for both cloud and local compute infrastructures.
The Redwood City, Calif.-based company emerged from stealth mode this week at the Bio-IT World conference with a preview of a variant analysis pipeline, dubbed SeqAlto, that runs on the company's Bina Box appliance, a hardware system comprising graphics processing units, field programmable gate arrays, and multicore central processing units that has been optimized for bioinformatics analysis, Narges Bani Asadi, the company's co-founder and CEO, explained to BioInform.
Once SeqAlto calls variants on the appliance, users can then move the sequence information into the Bina Cloud environment and run further analysis on it as well as manage, aggregate, and share information, she said. Bina Cloud is based on Amazon Web Services and includes tools for managing and sharing information and version control.
The company, which was founded about a year ago, is built on a “hybrid approach” to computing in which some analysis steps are run locally on an accelerated appliance, which others are better off being performed on the cloud, Asadi explained. The company also relies on a "holistic" optimization approach based on input from computer science, computational genomics, and electrical engineering, she said.
Bina plans to launch the complete pipeline — called the Bina Analysis Platform — later this year. Although it will compete with a growing number of NGS software vendors, as well as a number of companies offering cloud-based analysis services, the company is confident that its combined cloud and local hardware offering will be a key differentiator in the marketplace.
The SeqAlto pipeline includes a proprietary alignment algorithm; tools from the Broad Institute's Genome Analysis Toolkit that have been optimized to improve computation time and accuracy; and a bespoke algorithm for identifying copy number variations.
The pipeline accepts raw human genome sequence from Illumina HiSeq and MiSeq sequencers, performs the sequence alignment, recalibration, and realignment, and calls SNPs, insertions and deletions, structural variants, and copy number variants.
Bina plans to eventually include additional analysis capabilities in its pipeline including algorithms to analyze RNA-seq and ChIP-seq data, she added.
Asadi and collaborators at Stanford University plan in the next few weeks to publish a paper that will describe in detail a generic version of the alignment algorithm, as well as a separate paper describing the CNV algorithm, which, she said, is up to 30 times faster than open source tools like CNVnator.
Future incarnations of the platform will include tools to handle multiple samples in comparative genomics experiments, for example, as well as tools that will allow researchers to use sequence, protein, or gene expression data to develop models for disease, she added.
Bina is currently accepting applications from potential customers for a pilot program —which it plans to kick off in a few weeks — that will put the platform through its paces prior to full commercialization.
Asadi said the company hopes to make the product “as perfect as possible” before its official launch later this year.
Bina is interested in applications from scientists using NGS data for basic research, applied markets, clinical applications, diagnostics development, and biopharmaceutical development so that it can get a sense of the market needs and the best customers for its platform, she said.
Bina expects to pick a final list of customers it will work within the next few weeks. It also expects to conclude the pilot project in the early fall and then bring the product to market officially by mid-fall.
So far, Asadi said, the company has drummed up some interest from genome centers and diagnostic labs although she did not disclose specific details about who those groups are.
The company isn’t disclosing its pricing until it is ready to commercialize the product, she said.
Meanwhile, Bina is focusing on growing its business and is looking to add to its 15-person staff, Asadi said.
Specifically, it is looking for software engineers who are familiar with cloud computing, distributed software computing, and high-performance computing, as well as bioinformaticians and biostatisticians who “understand genomics,” she said.
Stepping Into the Limelight
Over the last year, Bina has been prepping its SeqAlto pipeline and hardware infrastructure for its market debut, Asadi said.
Asadi began working on the algorithms underlying Bina’s technology while she was a doctoral student in electrical engineering at Stanford. She explained that while there, she and colleagues developed the “Bina language” — which is responsible for the improved speeds associated with running software on the company’s hardware infrastructure — in an attempt to provide a tool that would support multidisciplinary projects involving researchers in medical schools, statisticians, and computer scientists.
Her company now applies the same approach to address NGS data issues, she said.
The Bina Box can be configured to include as many nodes as needed, depending on customers’ data analysis needs. A default box comes equipped with four nodes, each with 96 gigabytes of random access memory.
In one scenario discussed at Bio-IT World, the company ran its SeqAlto pipeline on AWS and compared it to two other variant analysis and alignment pipelines that were also run on the cloud, namely Stampy paired with the Genome Analysis Toolkit and the Burrows-Wheeler Aligner paired with GATK as well.
The company used all three pipelines to analyze a whole human genome at 30X coverage on a single Amazon instance with 8 cores and 68 gigabytes of random access memory.
Bina reported that the Stampy/GATK pipeline took more than 140 hours to complete the analysis, the BWA/GATK pipeline took more than 120 hours, and the SeqAlto pipeline took around 60 hours.
The company also ran the SeqAlto pipeline on the same data on the default Bina Box, which completed the analysis in about two hours.
“We looked at the [analysis] problem from all angles and really tried to optimize it,” Asadi explained. One advantage, she noted, is that the company is using its own algorithms, rather than trying to accelerate an “off-the-shelf algorithm.”