NEW YORK (GenomeWeb) – UK-based software firm Repositive has received a SMART grant of £50,000 (about $78,100) from Innovate UK, which it will use to develop its first commercial product for the genomics arena — an application that will help researchers make use of datasets containing sensitive information without compromising the security of the data and the privacy of its contributors.
Repositive has already developed and is currently beta testing a free so-called data-discovery platform that uses metadata provided by developers of public, semi-public, and proprietary data repositories to make their genomic datasets more visible to potential users. Its intent is to make it easier for researchers to access data needed for their projects and to provide a forum for initiating collaborations with colleagues in the community. In March, Repositive raised £300,000 (about $446,000) in seed funding, which it is using to support the development of this platform ahead of a full launch planned for September this year.
Meanwhile, the new funds from Innovate UK will support the development and commercialization of novel privacy-preserving technologies that would allow users to run queries on genomic datasets containing sensitive information and extract basic high-level statistics from these datasets without requiring levels of access that could compromise the privacy of the individual genome contributors, Repositive CEO Fiona Nielsen told GenomeWeb. "It's the intermediate technology that allows you to put a layer on top of data [and] make queries to that layer that allows you to extract statistics [without seeing] the sensitive data that lies underneath," she said.
Repositive intends to use the current funds to develop a proof-of-concept tool over the next six months. It will then seek additional funding to support the development of the actual product which it expects to complete in about eight months, Nielsen said. "We think we still need to go through another round of investment before we've built our products to the point where they are marketable and we can start making real sales," she said. "We expect that we need a round of investment to support us for at least another year and a half before we'll start getting significant income from sales."
Once the technology development stage is complete, Repositive will then put the new query tool through its paces in a private beta prior to a public launch that could potentially happen sometime in 2017. When the product does eventually go to market, Repostive will offer it as a paid service — as part of its existing platform — to large consortia, pharmaceutical companies, and other custodians of large databases who want to make their data available to collaborators or customers but have been unable to do so because of privacy concerns, the difficulties of moving extremely large datasets around, and issues with maintaining multiple copies of the same data in different locations, Nielsen told GenomeWeb. The plan, she said, is to offer the query service for a yet-to-be determined subscription fee that would cover the amount of times a customer wants the service to be operational.
Repositive is a spinout from UK-based charity DNAdigest. Nielsen launched both the charity and company — after leaving a position at Illumina — with the same mission in mind: to develop mechanisms that would improve access to data and facilitate more efficient data sharing. At Illumina, Nielsen researched genetic variants in cancer and also worked on building diagnostic and analysis tools focused on oncology. "We [were] producing lots of data but that data was not available to anybody else and we were not able to access anyone else's data," she told GenomeWeb. "Interpretation is only possible if you've got lots of data to compare it to ... so no matter how much work I would do analyzing individual genomes, it wouldn't make a large impact because we would never reach this point of precision medicine if we would not have access to large amounts of data [for comparison]."
DNAdigest was formed to promote both good data sharing practices and existing tools and organizations that make support sharing, she said. Repostive forms the other half of this coin by offering a platform for making genomic datasets more visible as well as providing a forum for initiating collaborations around shared research interests.
Just as an individual might use the TripAdvisor website to choose a hotel that best suits their needs, Repositive aims to act as an intermediary that tells researchers what datasets are available to them and helps them filter through a number of possible options to select the sources that best suit their purposes. Researchers can use a simple interface to search through publicly available human genome datasets and identify which ones are most relevant for their projects and studies. The company has sourced and indexed these datasets from multiple repositories using metadata provided by developers, and descriptions of the datasets are contained in a repository.
Repositive does not host datasets internally, rather it provides links to the repositories where the data is located in the platform so that users can access and download datasets directly from the source if they are public, submit access applications for restricted datasets in repositories such as dbGAP, or purchase licenses in the case of commercial databases.
Besides dbGAP, Repositive has indexed databases maintained by the European Bioinformatics Institute, the Gene Ontology, the Sequence Read Archive, and Array Express, among others, in its system and it continues to look for additional resources. In fact, datasets don't have to come from large consortia. For example, individual users who have accumulated data that they would like to make widely available for more in-depth analysis in combination with other datasets can also register the metadata from their repositories in the Repositive platform, Nielsen said.
In addition to making these individual datasets easier to find, Repositive is also working on improving the metadata that these developers include with their repositories by "allowing the users of the platform to comment [on] and tag the different data descriptions," Nielson said. The aim is to help these developers provide more complete and more useful descriptions of their datasets, she said.
The company is also working with some larger repositories to capture their metadata in more structured ways to make it easier for the company to use its application programming interface to pull the metadata of these datasets into their platform.
"It's in our common interest," Nielson said. "They want to have more visibility for their datasets and the easier it is for us to show the metadata, the easier it is for us to drive traffic to their datasets." The company is also talking with a number of commercial database vendors who are interested in showing their datasets in the Repositive's system hoping to catch the eyes of potential customers who don't mind paying a fee for more curated versions of public datasets, she added.
Repositive's data discovery platform has been in private beta since last year. So far, the response to the platform has been positive, Nielsen said, with users reporting that it makes searching for pertinent datasets a much simpler and less time-consuming prospect. For the beta, Repositive handpicked users and worked with them one at a time to test the platform, remove bugs, and add new features. One new feature they added as a result of the beta is to show simple metadata about each dataset at first and only provide additional descriptions if the user indicates interest in a particular dataset. This seemed preferable to showing all of the metadata about datasets right off the bat, Nielsen explained.
The next test will come when the platform launches in September and multiple users jump on the system, Nielsen said. It will be especially interesting, she said, to watch the community participation levels since a lot of beta testers expressed interest in getting feedback from other researchers about datasets that they have made available through existing repositories.
"Our ambition is to make it as easy as possible to access data [and] so the more that we can help the user access data that's in either public repositories or in other institutions by enabling both the technical solutions and the software that supports the governance around good data access, [the better]" she said. "That’s what we are offering and right now, we don't see any others with that focus in the products that they offer."