NEW YORK (GenomeWeb) – Chinese genomics service provider Annoroad has released SolarGenomics, a cloud-based platform that seeks to provide a one-stop shop for bioinformatics pipelines and data storage infrastructure.
Jitao Yang, Annoroad's director of information technology, described Solar Genomics as an ecosystem that is designed to support "the whole business process of genome sequencing services."
The target market for SolarGenomics is small research organizations and clinical laboratories that are looking to incorporate omics data into their practices who may not have the resources to build and maintain internal compute clusters and data storage systems, according to Yang.
"Most research organizations or hospitals generally do not have capabilit[ies] for bioinformatics analysis or to establish [high-performance computing] centers," he said in an interview
He noted that there are few software platforms that support this kind of work, and existing platforms do not offer the same breadth of capabilities as SolarGenomics does. For example, BGI Online, the commercial cloud platform developed and maintained by sequencing services provider BGI, offers similar bioinformatics analysis capabilities as SolarGenomics, Yang said. BGI Online offers pipelines for RNA-sequencing, whole-genome, and whole-exome analysis, among other kinds of analyses.
Where the platforms differ is in the area of software specifically developed and tailored for sample and content management and tracking capabilities. Specifically, when samples are collected, customers can monitor them in the cloud. Once the sample arrives in the lab for testing, cloud users can track its movement using SolarGenomics' laboratory information management system from sequencing through to interpretation.
SolarGenomics also provides paid access to the Annoroad Typical Chinese Genomes (ATCG) database, something BGI online does not have, according to Yang. That database provides scientists access to whole-genome sequences from about half a million Chinese individuals including millions of SNPs and SNP frequency annotation data and is the first of its kind, according to Yang. Annoroad intends to expand the database including adding new database classification criteria and more advanced functionality for research and clinical applications.
The SolarGenomics cloud combines both public and private cloud infrastructure, according to Annoroad. The public portion of the cloud is built on the public Alibaba cloud. The private portion of the system is built on Annoroad's own onsite data center, and Solar Genomics can combine this resource with the computing and storage resources of the Alibaba cloud. Furthermore, the combined system ensures that customers have uninterrupted access to computing and storage resources since each system can pick up where the other leaves off.
In terms of analysis capabilities, SolarGenomics offers free and paid pipelines for sequence data acquisition, quality control, analysis, interpretation, and reporting. The company currently offers about 60 free pipelines and plans to offer additional cost-free pipelines in future. Pricing for access to the remaining analysis solutions varies depending on which pipeline consumers want to use.
In addition to providing services on the cloud, Annoroad offers its software products to research labs and clinics that want to set up their own sequencing and bioinformatics services internally, Yang said. "We can help them run a computing center and we can also implement our bioinformatics pipelines [on] their compute [infrastructure]." The list of software available on the SolarGenomics cloud includes the FAQCS software which handles sequence quality control, a tool for miRNA target gene prediction, and BWA_index which is used for matching sequences to reference genomes.
SolarGenomics' pipelines are designed in such a way that customers with little to no computing background can run them with ease, Yang said. All they have to do is upload their data, select which pipelines they want to use, and click a button to run them. Results are returned to customers within the cloud in as little as a few hours or a few days depending on the pipeline.
More computationally savvy customers can combine analysis tools available on the cloud into automated pipelines or pick from more than 100 pre-existing pipelines available on SolarGenomics. Furthermore, experienced users can take advantage of command-line access to develop bespoke analysis capabilities. SolarGenomics also offers an online training cluster environment, online forum, and online MOOC to teach customers to use analysis pipelines and other services.
So far, existing Annoroad clients that have used the cloud have responded favorably to the system, in particular, because the cloud simplifies the task of providing data to customers in more distant provinces. Historically, Annoroad has had to ship hard copies of data to some of its clients. Now those users can simply access those datasets on the cloud, which is far more convenient and efficient, Yang said. Each customer's data is encrypted and isolated from other customers' data which helps allay security concerns.
Furthermore, Annoroad offers to store data for customers after it is generated and maintains multiple backup copies of the data on the cloud so that clients don't have to take on that responsibility themselves, Yang said. The length of time that the data is stored depends on customers' requirements although the company typically stores it for up to a year, Yang said.
Beyond that, customers can purchase storage space for themselves if they want to keep their data for later use. Storage costs are separate from the charges to use the paid analysis pipelines as long-term data storage is available on the Alibaba component of the system. As such those costs are determined by the public cloud provider. Detailed per-month pricing for storage on the Alibaba cloud is available on the company's website.
According to Annoroad, the SolarGenomics platform currently stores and analyzes more than 15 TB of data including data from customers of the company's existing services portfolio. In addition to computation services, Annoroad also provides DNA/RNA extraction, pooling, and sequencing services on a NextSeq 550AR sequencing instrument put together in collaboration with Illumina that has been certified by the China Food and Drug Administration.