A recent survey of 24 core labs conducted on behalf of the Association for Biomolecular Resources' Genetic Variation Research Group found that data analysis is the primary challenge core labs face when it comes to next-generation sequencing.
Helaman Escobar, director of DNA Sequencing Operations at Eurofins MWG Operon, presented preliminary data from the survey this week at ABRF's annual meeting in Sacramento, Calif.
He said that the survey, conducted primarily by phone, highlighted the fact that data analysis is a key challenge for core labs that have adopted next-generation sequencing instruments. The second-most cited challenge was the lack of resources to validate new protocols as new instrumentation comes online.
Yet despite these challenges, the survey revealed that core labs offer a wide range of informatics services for next-generation sequencing, including long-term storage of images and processed files, IT support, and downstream analysis — a term that Escobar acknowledged as "tricky" because every lab has it's own definition for "downstream."
Escobar said that of the core labs that he surveyed, 21 percent provide no downstream analysis at all, while 13 percent offer only alignment to a reference genome. The majority of labs do offer some sort of bioinformatics support, however, with 29 percent of respondents saying they offer downstream analysis through their own staff, and 38 percent offering analytical services through a separate sequencing core.
The results suggest that even those labs that offer in-house bioinformatics support are open to commercial systems when they are available. Escobar said that 75 percent of the labs he surveyed offer some form of commercial bioinformatics software, with the most common vendors being CLC Bio, GenomeQuest, Partek, SoftGenetics, and Geospiza. He added that the other labs he spoke to said that they are "evaluating" commercial tools.
Escobar said that the commercial packages are generally used for specific applications, small projects, or as an option for customers who can't afford to pay the core lab to perform the analysis.
When it comes to LIMS, nine labs said they use no LIMS at all for next-gen sequencing, one said it was in the process of developing an in-house system, five said they offer a "partial" LIMS that doesn't include sample-tracking capabilities or other features, and five labs said that they are using an internally developed LIMS. In most cases, the in-house LIMS were adapted from something that was originally developed for handling microarray data, Escobar said. Only three labs said that they are using a commercial LIMS for next-gen sequencing data.
Labs are still struggling with the question of whether to store image files, and if so, how long. Escobar said that 10 labs store all image files for sequencing runs, while nine said they have stopped doing so and three labs said they currently store the image files but plan to discontinue that practice.
Of the labs that said they still store image files, one lab said that it plans to store them for 10 years, six said they store them for six months or more, four said "indefinitely," and three said "as long as possible," which in most cases was between two weeks and six months, Escobar said.
He added that some labs said that they tell their customers that they do not store the image files, but actually hold onto them just in case there is a problem and they need to re-analyze the data.
Regarding processed files, 70 percent said that they store that information indefinitely, while others keep it between two weeks and a year. All but two labs said that they do not charge clients for storage. One lab said that it does not charge for storage, but does charge clients for tape retrieval and another said it charges for archiving raw files.
Storage arrays for the surveyed core labs ranged in size from two to 200 terabytes, with an average of 42 terabytes, Escobar said.
In terms of IT support, half the labs have IT embedded within the core lab, while the other half said that they have some sort of arrangement with other IT groups at their institutions. Of those labs, five said that they have to pay for IT support — either by the hour or under longer-term support contracts. Pricing arrangements included $80 per hour, $500 per terabase per three years, and $2,000 per quarter.
Of those labs that charge clients for downstream analysis, only five said that they charge an additional fee for that work, which ranged from $25 to $80 per hour. The remainder of the labs said that the analysis costs are included in their sequencing fees. One lab, for example, allots four hours of bioinformatics support for each sequencing project, which Escobar said essentially entails a description of where else the researcher should go for analysis.
Escobar said that of the 24 labs he surveyed, six had more than one type of instrument. The breakdown in terms of total sequencers across the cores was 15 Illumina Genome Analyzers, eight Life Technologies SOLiDs, eight 454 FLX, and one Helicos.
The most popular applications that respondents cited were RNA-seq and ChIP-seq, which are offered by 31 percent and 28 percent of the surveyed labs, respectively. Genome resequencing is the next most popular application, with 15 percent of labs saying they offer that service, followed by 13 percent offering small RNA analysis, nine percent offering de novo sequencing, and four percent offering targeted resequencing.