By Matthew Dublin
One of the rarely cited benefits of cloud computing is that small bioinformatics startup companies now have a fighting chance to demo and market the kind of ambitious compute solutions which traditionally only more established institutions or companies had the funding to support. Software tools for dealing with big data that previously would have required cost prohibitive IT infrastructures to run can now be spun up on several nodes in the cloud, on the fly and on the cheap. This enables startups to provide working versions of their technology to build interest among investors and the user community until that lucky day when their technology is licensed out by a big institutional or commercial customer.
A small semantic search technology software company based in Brigantine, NJ called Weblib is using the cloud to do just that. I had the chance to chat with Weblib's CEO Tamas Doszkocs, a computer scientist and semantic technology expert who recently retired from the Specialized Information Services Division of the National Library of Medicine, about their use of the cloud.
Back in March, Weblib won a National Library of Medicine software development challenge. "Show off Your Apps: Innovative Uses of NLM Information," for NLMplus, a semantic search and discovery application that utilizes a variety of semantic resources and natural language processing tools to produce improved search results from the collection of biomedical data and services of the National Library of Medicine (NLM).
"We are a tiny company, so it's very hard to compete with the big guys in the semantic search arena, like IBM with their Watson system — those are awesome companies that can put hundreds of developers and unlimited resources on a particular problem," says Doszkocs. "In order to process all these full text biomedical databases, parallel processing is absolutely necessary, and the cheapest way of doing that is renting some time on the cloud. That way, these huge text databases can be broken into chunks to be processed in parallel in a way that's not that costly and takes only days to complete versus weeks on desktop computer."
NLMplus, which features a Google or Bing-like user interface, boasts as its primary innovation a semantic search engine that typically produces relevant search results from 1.6 million PubMed Review articles that are semantically indexed and searched. The NLMplus application also sends conceptually enhanced Boolean queries to NLM’s PubMed system of more than 21 million citations from the literature.
Doskocs says that as their customer base increases, they will also look to using the cloud as an on-demand computational resource for their online retrieval service