NEWYORK (GenomeWeb) – Newly minted health data startup SolveBio is seeking to build a business based on its informatics platform to help scientists developing diagnostic and research applications in hospitals, academia, and industry access and use genomic and other kinds of data more easily.
Basically, SolveBio exists "to make data more easily accessible to developers … in hospitals, companies, and research institutions [who are] building applications using genomic data [and] have a hard or an expensive time getting all these external data sources," SolveBio's CEO Mark Kaganovich explained to BioInform. This platform, which shares the company's name, is intended to help researchers in these contexts circumvent the repetitive and frequently frustrating cycle of "downloading, parsing, and rearranging [external] data," he said. "We've used new technology to index the data in a way that makes it programmatically accessible [in a way that’s] faster and simpler" and that frees the researchers up to focus on using the data "to build models and do research or diagnoses."
Within cloud-based infrastructure, SolveBio hosts collated, curated, and versioned data from public and proprietary resources for genes, proteins, metabolites, drug compounds, clinical trials, variant-disease relationships, and more. The list of source repositories has many of the usual suspects: the 1000 genomes project, the Cancer Genome Atlas, ClinVar, and the Online Mendelian Inheritance in Man. The data is stored and used in a secure environment complete with access controls, full audit trails, and encryption technology.
Customers integrate the data into their internal applications and workflows through application programming interfaces (APIs) that the company has developed — presently there are language bindings for Python and JavaScript with bindings for Ruby and R to come. It’s a simpler and more efficient alternative to downloading and re-indexing entire TCGA datasets, for instance, each time the data is needed for analysis, Kaganovich said.
The system is currently being tested in a private beta with a number of undisclosed partners. The testing round is expected to last six months. SolveBio's pricing model is similar to the one used by Amazon Web Services; customers pay more if they use more resources, for example, if they run larger queries that use more datasets or incorporate additional security features. The data itself is free; customers only pay for use of the infrastructure. The exact dollar amount is not being disclosed at this time.
In addition to hosting existing datasets, SolveBio also hopes to take on the role of data broker for hospitals and research institutions that are willing to make their internal data available on the company's platform. "Our [main] business is the infrastructure behind accessing those datasets but we think that we really create a lot of value for people if we also help distribute proprietary datasets," Kaganovich said. "There are already lots of datasets that people make licenses for and that … could be really interesting because if you could generate a value network for people to get paid if they expose data, then you'll end up incentivizing more and more data production." The company is still honing the mechanism that it will use for this process in collaboration with some select partners, working on striking the right balance between data quality, privacy, and security, he said.
SolveBio joins the ranks of companies whose portfolios include products that make genomic data easier to access to and use. On the research side for instance, InSilico Genomics hosts public sequence data and offers paying customers an opportunity to compare their internal data with the information on its platform culled from resources such as TCGA and the Gene Expression Ominbus, and to analyze it with open source tools. DNAnexus, meanwhile, provides access to a version of the National Center for Biotechnology Information's Sequence Read Archive.
SolveBio believes its offering can find use in both research and clinical settings especially in the latter instance where there is still much to be done to make data more accessible, Kaganovich said. "Our pitch … is we want to help programmers focus on the stuff that’s actually productive — building models, making the graphs, and integrating new data into their models," he said. In terms of the market, SolveBio sees infrastructure cobbled together by internal IT teams as its main competition. It is banking on potential customers being convinced that its product is a far more cost-effective and efficient alternative to the do-it-yourself approach. "People spend a third to half of their time on this kind of thing and if we could just shrink that, it would free up so much capital for people to actually be doing value-add stuff that these organizations have to do," he said.
SolveBio has raised $2 million in seed funding from Andreessen Horowitz, Max Levchin (who also sits on SolveBio's board), SV Angel, Nat Turner, Zach Weinberg, Charlie Cheever, and others. Kaganovich told BioInform that the company will use the funds to increase its headcount including hiring additional software engineers.
Currently wrapping up a doctoral degree in genomics from Stanford University, Kaganovich co-founded the company last June with David Caplan, who holds degrees in bioinformatics and biochemistry from the Universities of Calgary and Toronto respectively; Paul George, who holds electrical engineering degrees from Cornell University; and David Gross. Caplan is the company's chief technology officer, George handles the engineering aspects of the company's business, and Gross is the company's head of design and user experience.