NEW YORK (GenomeWeb) – An Australian team of researchers is putting a publicly available CRISPR/Cas9 guide RNA (gRNA) design tool on the cloud, with hopes that it will help them and others determine the contributions of epigenetics to CRISPR activity.
GT-Scan2, a CRISPR design tool that offers both on-target and off-target analysis for given gRNAs, makes use of new cloud computing infrastructure from Amazon Web Services called Lambda functions, or, microservices.
Microservices are analogous to CPUs floating around in the cloud, available at a moment's notice," Denis Bauer, a bioinformatician at Australia's Commonwealth Scientific and Industrial Research Organisation (CSIRO), told GenomeWeb. "You don't have to have a whole server or a cloud instance. It's something you can just grab and do the compute with. You don't' have to say, 'I need X many CPUs and X much memory,' you can just say 'This is my function, please execute.'"
Developed by Bauer and her colleague Aidan O'Brien at CSIRO, GT-Scan2 offers a way to quickly get information on potential gRNA effectiveness at hundreds, even thousands of sites in a gene, without having to write bespoke code to parallelize the computations on high-performance computing platforms.
She's using the tool in collaboration with CSIRO postdoc Laurence Wilson and Australian National University researcher Gaetan Burgio to address the issue of computational predictions of gRNA effectiveness not quite lining up with experimental data.
"No tool so far is able to predict with 100 percent accuracy what the outcome should be" in CRISPR editing, she said. "There are so many moving parts that contribute to CRISPR function that it requires a lot more work to really be able to predict with precision accuracy what the actual outcome will be."
GT-Scan2, which is being used in research on how epigenetics impacts CRISPR for genome engineering applications in various tissues, could help give more clarity. In particular, the new program adds cloud-based, on-target analysis for thousands of gRNAs at a time. It's an update to a prior tool, GT-Scan1, which was an off-target finder. GT-Scan1 was also a cloud-based service, but it used a different technology to facilitate the search.
"In bringing in the on-target predictor, we quickly ran into resource limitations," Bauer said.
Say a researcher wants to find the best of hundreds or thousands of CRISPR target sites in a given gene. An old shortcut had been to focus on the first exon, but Bauer said that recent studies have shown that's not always the right way to go.
"You have to open up the search space to potentially the whole gene," she said. "If you want to do that in parallel, which saves time and makes it possible to even search that many potential candidates, you have two choices: Hadoop plus Spark or microservices."
Bauer said choosing Lambda functions/microservices over Hadoop and Spark was a purely technical choice and not biologically inspired. "The result and quality would have been the same if it were Spark, high-performance compute, or lambda functions. The reason we chose to go with microservices is that the tasks you can define are modular. You can easily scale up as many functions or CPUs you need for your task to be executed.
Additionally, Lambda functions were cheaper. "In order to offer an 'always on' service requiring this much compute, we would have to run a Hadoop/Spark cluster continuously, which would have been prohibitively expensive," she said. "The alternative would have been to start the cluster on demand when a user goes to the webpage, but that would have taken a while before the cluster was ready for use (several minutes).
GT-Scan2 joins several other cloud-based tools for CRISPR design. Benchling has also chosen to use AWS Lambda functions to power its CRISPR design tool. And recently, Broad Institute researcher John Doench partnered with Microsoft Research to put CRISPR design algorithms on Microsoft's Azure cloud platform.
The decision to go with microservices has yielded unforeseen benefits, Bauer said. "Amazon has a specific limitation on the amount of resources you can request from one function." At first, her team thought AWS would not cover their use case, "but we thought about how to work around the limitations, and the result is actually even better than if we had not had the resources limitation. Splitting the task even more makes it even more parallel, which makes it even faster," she said.
The tool is available online for people looking at only one or a handful of genes, but Bauer's team has also developed an API for batch jobs and is open to providing the API calls to any interested parties.