This article has been updated to correct Joel Dudley's previously reported position and affiliation as well as to reflect that change in the article's title.
Researchers at Stanford University and Mount Sinai School of Medicine have developed a cloud-based analysis platform that lets users find variants in personal genome information from sequencing service providers like Illumina.
Scalable Tools for Open source Read Mapping, or STORMSeq, is available on the Amazon cloud and provides users with open source tools such as the Broad Institute’s Genome Analysis Toolkit that let them play with their genomic data without requiring command line programming or large compute clusters, Konrad Karczewski, a doctoral student in biomedical informatics at Stanford University and one of STORMSeq’s developers, told BioInform.
Karczewski and co-developer Joel Dudley, director of biomedical informatics at MSSM, began building STORMSeq after they were selected to have their genomes sequenced by Illumina as part of a pilot project to explore the clinical interpretation of results for the company's Individual Genome Sequencing service. Karczewski and Dudley used their own genomic data to put the platform through its paces.
Besides Illumina, Karczewski also said that the platform can accept data from an exome sequencing pilot project launched by 23andMe that provides participants with their sequence reads.
Using STORMSeq is “almost as easy as checking email,” he said. Although there are obviously a few more steps then that, it's “nothing particularly complicated.”
To use the tool, users have to create an Amazon account, upload their raw data to an S3 storage bucket in the cloud, search for and select the STORMSeq machine image, and then run the pipeline using a default set of parameters that the developers have provided or they can choose their own, Karczewski said.
The system also includes a progress bar that lets users monitor the pipeline’s activity. Finally, it performs some quality control checks and then uploads the variants it finds in the data back to the initial S3 storage bucket that held the raw reads.
Among other kinds of variants, STORMSeq also includes insertions and deletions in its final output, information that current interpretation services, such as those provided by Illumina, don’t provide in the results they give to customers, he added.
Once users receive the VCF files containing their variants, they can choose from a number of interpretation tools offered by companies such as Ingenuity and Omicia to further analyze them or they can run their data through a separate variant interpretation program developed by the STORMSeq team called Interpretome, Dudley told BioInform.
The browser-based Interpretome platform currently only accepts genotyping data from 23andMe but the developers are working on enabling it to work for full genomes, Karczewski said.
They are also working on incorporating variant annotation information from public literature into STORMSeq to provide users with additional information about their variants, he said.
While STORMSeq itself is free, users do have to pay for time on the Amazon cloud and costs vary depending on the amount of data and the parameters set by the user, Karczewski said
Furthermore, the time required to run the pipeline varies — it could take anywhere from a few hours for a small exome to about a week for a whole genome depending on how much compute power is purchased and used as well as which parameters selected, Karczewski said.
The developers have no plans to commercialize STORMSeq.