Thomas Jefferson University's Computational Medicine Center has begun offering analysis services on HandsFree, a web-based system developed by researchers at the center for analyzing next-generation sequence data.
For a range of fees, researchers can upload mouse and human DNA, RNA, exome, and ChIP-sequencing datasets to HandsFree's server and provide some information about the type of data and the kind of analysis they would like to run such as microRNA expression profiling or SNP detection. The dataset is then quality-trimmed, preprocessed, and mapped to the corresponding reference genome. It's then analyzed per the customer's request and returned.
Analyses are typically completed within 12 to 18 hours but those estimates are subject to the number of projects that are in the analysis queue, the number of available processors, and the dataset's size. Upon completion of the project, customers are notified via email and have 30 days to retrieve both their input and output data from Thomas Jefferson's system before its deleted.
HandsFree accepts data from Ion Torrent, SOLiD, Illumina, and Solexa sequencers. Data from the SOLiD platform should be submitted in XSQ format, or include the _F3.csfasta and _F3_QV.qual file pair. Data from the other three kinds of sequencers should be provided in .fastq or .fastq.gz format.
Customers are charged for each job they submit to HandsFree. Pricing varies depending on the sequencer used, the type of read, and the kind of customer — academic versus non-academic.
For data from Ion Torrent's instruments, academic and non-academic clients are charged $10 and $15 respectively for single-ended read datasets. For all other machines, academic pricing for single-ended reads is $177 for every 100 million raw reads while non-academic clients pay $265 for the same amount. Pricing for analyzing mate-pair and paired-end reads from all the accepted sequencers is still being determined.
Academic institutions offering NGS data-analysis services aren't novel anymore. Many institutions such as Harvard University and the University of Pennsylvania have established core facilities that among other things provide DNA and RNA sequencing data analysis pipelines as well as technical expertise. However, HandsFree is unique because it wraps these capabilities into an automated resource that does not require human contact or input, Isidore Rigoutsos, the director of the center, told BioInform. It's also broadly available whereas some core facilities only provide services for researchers in their home institutions, he said.
Furthermore, because HandsFree is web-based researchers can submit projects at any time once they've created accounts and provided credit card information. The system is set up such that when customers submit data, it automatically calculates the cost of the analysis and proceeds with the project if the customer approves the transaction.
The system is also "very modular," Rigoutsos said. That means that the developers can easily incorporate new capabilities if users ask for them. He added that the center will prioritize the requests it receives based on the need for the capability in question.
"When you have something automated like this … it cannot answer all questions for all users, so we [opt] to address most questions for most users and be dynamic as we move on," he said.
HandsFree's developers began building the system nearly two years ago to provide easy access to the computational tools they were using to study non-coding RNAs. At the time, Rigoutsos said, researchers at Thomas Jefferson had begun sequencing data in earnest, and meeting the demands for data analysis was becoming increasingly difficult.
"It occurred to us that perhaps we can do the next best thing" which was to "provide access to the tools we were using for our own basic research work" and to do it in such a way that users wouldn't have to run the computations themselves, he said.
HandsFree combines publicly available tools such as cutadapt — a tool which removes adapter sequences from DNA — and internally developed capabilities such as authentication codes and a secure protocol for transferring data from the HandsFree submission portal, which sits outside Thomas Jefferson's firewall, to the computational infrastructure used to run the analyses, which sits behind the university security system, and back again once the project is complete. The list of bespoke capabilities also includes tools that generate expression profiles and ensure that reads that cannot be mapped to the reference genomes are discarded, Rigoutsos said.
In addition to their analysis results, customers also receive genomic maps which they can upload to software such as the UCSC genome browser for more in-depth studies. HandsFree generates maps for each chromosome individually and stores them in separate files for easier upload to external visualization tools.
“The whole process is as hands-free as it can get for these kinds of datasets,” according to Rigoutsos. “The investigator still has some work ahead of them but the system does all the ‘heavy lifting’ for them taking the guess-work out and making this kind of analysis easy to harness.”