NEW YORK (GenomeWeb) – 5AM Solutions has unveiled an early version of a new web-based software tool it has developed called 5AM Sunrise that's designed to help pharmaceutical and biotechnology companies more efficiently manage, aggregate, and make sense of their clinical and molecular data.
The company demoed the new software-as-service solution at this year's Bio-IT World Conference in Boston last month. It did so hoping to get some input and feedback from the community that it could use to further develop the platform ahead of a three-to-four-month beta that's scheduled to start in the late summer or early fall, William FitzHugh, the company's chief science officer and research lead, told GenomeWeb. It has also begun recruiting early adopters to try out the software as part of the beta and offer feedback on what features should be prioritized in the first release.
5AM hopes to launch the full solution by the end of the year. It will offer access to Sunrise for a monthly subscription fee that will cover all users in a given company or institution. The exact fee will vary depending on the size of the customer in question, according to FitzHugh. Because pricing details are still being fleshed out internally, he declined to provide specific numbers.
5AM partnered with Cloudera to develop the solution. FitzHugh told GenomeWeb that the Cloudera platform serves as the backend data management and storage infrastructure for the Sunrise software. Specifically, under the hood, Sunrise uses the Hadoop distributed file system and some of the associated database technologies such as HBase to manage data and filing. "The Hadoop infrastructure we felt was a natural fit for the underlying technology for 5AM Sunrise," he said.
Also, Cloudera's platform is one of the few commercially supported Hadoop systems. "When you are working with pharmaceutical companies, it's an advantage to have a commercially supported platform," he explained. "Our expertise is in developing applications not supporting Hadoop."
Finally, Cloudera is interested in the life sciences and healthcare arena. Earlier this year, it pledged to provide over $3 million in software, training, and services to researchers in academic and government research institutions involved in the White House's Precision Medicine Initiative. These factors made Cloudera an ideal first partner for 5AM, but Sunrise is not exclusively tied to that system. "We do plan to be available on other implementations of Hadoop as well," FitzHugh said.
Sunrise is 5AM Solutions' first commercial software product. It has offered free software in the past. In 2011, it rolled out a free plugin for the Firefox browser called SNPTips that targeted customers of direct-to-consumer genomics services such as 23andMe. It let these clients compare their raw 23andMe genetic information with the latest SNP data from online journals and open data repositories.
But historically, the company has focused on custom software development for the pharma and biotechnology industry and in academia. Previous customers include QuantaLife (now part of Bio-Rad), which worked with 5AM on primers and probes for a digital PCR platform. 5AM has also built tools for groups at Howard Hughes Medical Institute's Janelia Farms and at the National Cancer Institute. It also worked with Asterand, a supplier of human tissue and tissue-based services, to electronically link that company's biorepositories and tools to help them identify, annotate, and distribute biospecimen data.
Over time, however, the company realized that it was essentially solving the same kinds of problems for clients over and over again, according to FitzHugh. Specifically, clients needed better tools for managing, integrating, and curating large quantities of clinical and genomic data stored in various internal files and databases. What usually happens, he said, is that data scientists at these companies spend most of their time manually cleaning up the data and writing bespoke scripts that allow them to integrate the information. "They end up not having time to do the analysis that's the really important part of their job."
That is a perceived pain point that 5AM believes it can address and is the primary reason for refocusing its business, according to FitzHugh. Instead of building systems from scratch each time to help new clients with their data challenges, the more logical approach would be to combine pre-built tools into a new product that they could sell to customers. Among other benefits, "we can get recurring revenue and meet the needs with one product rather than multiple products," he said.
As part of that decision to refocus on product development, 5AM hired a new CEO, Earl Furfine, a little over a year ago with experience in growing start-ups to guide the company's transition, FitzHugh said. According to his business profile, Furfine has founded or been a partner in seven technology startups.
The firm also began developing 5AM Sunrise to help researchers automate and streamline data cleaning and integration processes. There is potential for the software to be used in other industries but the initial target market is pharma and biotech. "The idea behind Sunrise is that you can create the right entities and attributes for your business," FitzHugh explained. "[It] doesn't come pre-loaded with a set of things that it knows about. The user create[s] the specific things that they need to manage the data that their business cares about."
For example, a user might upload a spreadsheet into the system that contains information about patients in a clinical trial into Sunrise. The first time they upload the file to the system, the user has to define specific fields and attributes in the system, FitzHugh said. They also have to create rules that the system would use to validate and transform the values within the file to standard vocabularies and map them to entities and attributes.
So in the clinical trial spreadsheet scenario, a user would tag the patient as the entity and the attributes would be information about the patient provided in the file such as the patient gender, date of birth, and height. Once that's done, the user can then establish rules for Sunrise so that when files are uploaded in the future, it will be able to automatically assign values to fields and attributes. They can also create rules for flagging values that don't fit a given field or attribute. For example, if it finds a numerical value in a column meant for the attribute gender, it can mark that as something for the user to check and correct before loading the data into the software. "There might still be some things that need to be addressed but the number would be much smaller," FitzHugh said. Overall, however, the cleaning process is much quicker and more efficient with Sunrise.
In addition to the file upload mechanism, 5AM is also working on a mechanism for Sunrise that will let users access and aggregate information from larger datasets where they are stored instead of trying to upload them to the software, he added.
Once the data is in Sunrise, there are built-in data exploration and visualization capabilities that users can take advantage of. For example, they can use scatter plots and Manhattan plots to compare clinical and genomic variables and make different associations and correlations, FitzHugh said. Also Sunrise has some logic that it uses to pick the right visualization and the right statistical analysis to compare those variables, he said.
For example, if a user wants to explore a dataset with two numerical variables, then the system might point them towards running a Pearson correlation and visualizing the results using a scatterplot, he said. On the other hand, if they have a numerical and categorical variable — for instance, progression free survival in days and the genotype of a particular SNP — the system would suggest a different visualization and run a different calculation of the significance of correlation.
Selecting and running those types of calculations and visualizations may be obvious to clients who have some statistical or bioinformatics expertise but it could be a boon for clinical researchers who may not be as informatics-savvy," FitzHugh said. "It's a nice shortcut to be able to get to the right visualization very quickly that shows you the comparison you are looking for."
At Bio-IT, the company's demonstrated Sunrise's ability to import genotype and clinical data and make comparisons between datasets. "We got a lot of good feedback," FitzHugh told GenomeWeb. Specifically, researchers were impressed with the software's visualization capabilities and its ability to import and integrate clinical and genomic data using shared attributes. The company also got good feedback about Sunrise's ability to capture metadata around datasets that are loaded into the system including input on how researchers in other contexts have handled the metadata issues.
When Sunrise goes to market, it will have to compete with data integration offerings from companies like Informatica as well as visualization and analysis capabilities from companies like Tableau. Compared to those firms, Sunrise "provides a nice combination of the integration and curation process with visualization and analytics."
More direct competition will come from Tamr, which offers a similar platform for combining and integrating data for analysis in various contexts and is trying to break into the genomics arena. Earlier this year, Tamr said that it would offer its platform and expertise for free to researchers affiliated with the White House's Cancer Moonshot Task Force to help them organize, unify, and integrate large quantities of genomic and other data for analysis.
There could be differences in the way each company's software integrates data that set them apart from one another, FitzHugh noted. But 5AM has significant domain experience from years of developing software for handling genomic data for companies like Thermo Fischer Scientific's Ion Torrent that distinguish it from newer entrants like Tamr, he said. "That a real strength of ours."
Besides Sunrise, 5AM is mulling additional products for the marketplace but these are still in the early stages and FitzHugh declined to discuss details about them. The company will also continue to offer custom software development services in addition to selling products. "Doing innovative software projects ... is one of the ways we stay ahead of our competitors and keep our knowledge up to date," he said.