NEW YORK (GenomeWeb) – Making good on its promise to bring its first product to market in 2016, bioinformatics software developer FactBio has officially launched the Knowledge Sharing Platform, or Kusp, a software solution for curating and annotating biomedical data using standard ontologies and identifiers that targets the pharmaceutical industry and academic researchers.
The launch follows a beta conducted in the first half of this year with 15 curators, data scientists, and bench biologists from pharma and academia who tested the solution, provided feedback, and suggested modifications such as visualizing data in spreadsheets and incorporating additional ontologies for describing entities with the system.
FactBio CEO James Malone told GenomeWeb that the response to the beta was mostly positive especially from curators who see how a solution like this could help them work more efficiently and quickly. But, he added, the method is also providing attractive to a much wider pool of users who are clueing in to the value of data annotation and curation for their business but don't necessarily want to hire teams of trained data curators to do it. To that end, "we very specifically tailored the user interface to be as accessible to as wide an audience as possible," Malone said. "Anyone should be able to upload data, do some annotations, [and] tweak things ... without too much effort."
Basically, Kusp uses standard community ontologies for describing diseases, phenotypes, compounds, cell types, and more, as well as gene and protein identifiers to add semantics to customers' data. Users load datasets into spreadsheets housed in so-called virtual BioBuckets within Kusp where they can add descriptions about experiments including details such as specific measurements taken or assays used, as well as automatically associate column entries to terms from the Gene, Phenotype, and Disease ontologies, among others.
These annotations make it possible for the system to, for example, identify both Crohn's disease and ulcerative colitis as types of inflammatory bowel disease and return datasets containing these terms in response to a user query, Malone explained. "The ability to pull those things together is possible because it understands the semantics [and] that these are types of IBD," he said, noting that it adds that "extra value" to the data that some other approaches such as text-matching, for example, may not be able to provide.
Moreover, everything annotated within the system is linked to so-called entry cards that provide more fine-grained information about the entity in question. For example, clicking on a gene name takes the user to a card that covers links to peer-reviewed resources that describe the gene as well as links to any associated proteins, SNPs, diseases, and phenotypes. "You have all of the background on each of the annotations alongside your data, [which] is actually quite an important thing that [beta testers] wanted," Malone said. "They wanted to be able to not just run [data] through an algorithm ... [but also] to see what it had done with the data."
Another advantage of Kusp is it makes comparing internal information to third-party resources that also use ontology standards much less complicated in addition to laying the groundwork for widespread data integration, Malone noted. Some pharma firms use standard ontologies to describe data within their organizations but others prefer to adopt bespoke standards which can vary in the way they are implemented from one lab to the next within the company.
"If you are doing a pre-competitive collaboration, which a lot of organizations are starting to do now, [and] you all have your own vocabularies and ontologies and gene identifiers ... putting those things together can be quite a challenge," Malone said. Kusp makes the task of pulling that data together for such research collaborations much more automatic. Furthermore, it can learn user-inputted annotation rules and automatically apply them to new datasets resulting in a more fluid annotation process over time, he added. It also learns from its mistakes and incorporates corrections into future data annotation activities thus improving its accuracy.
As CEO Malone sees it, in five years, a scientist should be able to pick up a dataset and understand what it means or, alternatively, feed the data into a computer and have it be able to make sense of the information provided. Consistent data descriptions and annotations are crucial to making that vision a reality and this is precisely what Kusp is designed to enable. When FactBio adds data integration capabilities to Kusp in the next couple of months, users will be able to reap the benefits of consistency. "If you use the standards, then you can plug into a lot of other public data that is available," he said.
FactBio can configure Kusp to align with users' needs and requirements, the company said. Specifically, it can deploy the platform on local infrastructure for larger pharma companies, while individual users and small to medium enterprises can access the platform as a secure cloud-based service. So far, at least three pharma companies have expressed interest in Kusp, according to FactBio.
In terms of pricing, GenomeWeb reported previously that the company planned to offer a set number of BioBuckets for free and then to charge a subscription fee for users who wanted to purchase additional BioBuckets. That will still be an option for users, Malone said, but the company also plans to offer individual and group licenses as well as site licenses for larger organizations. It will also offer a free student package that will give doctoral students easy access to the platform. Malone declined to provide specific pricing numbers in either case but he did say that prices will be "very competitive" and that the firm will offer discounted pricing for academic users.
In terms of competition, FactBio believes that Kusp offers aspects of other company's products but is unique in that it combines them into a single product, according to FactBio Chief Operating Office Tony Stephenson. For example, there are companies such as Top Quadrant that develop ontologies as well as firms like Thomson Reuters that provide curated data. Kusp sits somewhere in between those two. It uses "various algorithms and other techniques alongside standards ontologies to make ontologies usable for the non-expert," Stephenson said in an email. Malone noted that some other companies offer more generic text matching solutions but "Kusp is not just about aligning text, it's really about aligning on semantics and meaning."