In an effort to meet rising demand for easy-to-use bioinformatics tools that support next-generation sequencing instruments, CLC Bio is collaborating with a group of Danish academic institutes to develop a software suite that can analyze high-throughput sequencing data in a broad range of application areas.
The three-year initiative, called Seqnet, is funded by a $3 million grant from the Danish Agency for Science Technology and Innovation. The project is not associated with an EU-funded initiative that goes by the same name.
The partners plan to use CLC Bio’s Workbench environment as a standard user interface for a number of new algorithms under development at the company and the academic centers.
CLC Bio is currently developing an algorithm for aligning short reads against a reference genome for resequencing studies. That, along with associated genome and contig viewers, will likely be the first tools to come out of the project and should be available this spring, Roald Forsberg, senior scientific officer at the company, told In Sequence’s sister publication BioInform last week.
Forsberg said that the company and its partners will follow that with tools for de novo assembly, digital gene expression, metagenomics, SNP detection, and clustering and assembly of EST and cDNA sequences.
The centers, which include Aalborg University, Aarhus University, the University of Copenhagen, and the University of Southern Denmark, currently have three Illumina Genome Analyzers and two 454 Life Sciences systems between them, Kåre Lehmann Nielsen, an associate professor at the Department of Life Sciences at Aalborg University, told BioInform.
Nielsen, who is also heading the Seqnet project, said that while these new technologies have offered smaller research groups the sequencing capacity that was once only available at major genome centers, bioinformatics is still a critical bottleneck.
Unlike the big genome centers, “ordinary laboratories … do not have the bioinformatics expertise available to develop their own tools,” Nielsen told BioInform. Therefore, he said, his group and the other Danish institutes decided to pool their resources and work with CLC to “develop new and more user-friendly software for different applications.”
While next-gen sequencing vendors like Illumina and 454 provide some software with their instruments, Nielsen said that they have primarily focused on resequencing, leaving a dearth of reliable analytical tools for other applications.
For example, Nielsen said that his team at Aalborg University is particularly interested in using its Illumina Genome Analyzer for digital gene-expression analysis, but the software tools that come with the sequencer are currently “not good enough” for that application and there are “several issues that they don’t address.”
In particular, he cited error correction modeling as an important requirement. “If you want to exploit the fact that we can sequence deeper, you need to be able, in the low-abundance range, to be able to distinguish what is a true biological tag and what is a sequence variant error for a more abundant tag,” he said.
In addition, current software tools for analyzing gene-expression data are insufficient to handle the amount of data that the Illumina instrument produces, he said. “In the last two weeks we have sequenced 130 samples, so we will need software to actually do more comprehensive visualization of gene-expression changes, and not just pairwise comparisons.”
Nielsen said that he also expects Seqnet to create tools that will automatically reformat files from different platforms so that they can easily be analyzed by the same software package. “I wouldn’t say you couldn’t do that now,” he said, “but you have to write a small script or develop something, and that’s not always available.”
Forsberg said that one of the key goals of CLC Bio is to ensure that all the algorithms developed in the project work with data from the 454 and Illumina systems, as well as from Applied Biosystems’ SOLiD sequencer and standard Sanger sequences.
Unlike the big genome centers, “ordinary laboratories … do not have the bioinformatics expertise available to develop their own tools.” |
He said that a number of its customers use “hybrid approaches” that combine multiple sequencing platforms, so it plans to enable researchers to integrate different types of data. In addition, he said, the tools developed under Seqnet will be tightly integrated with CLC Bio’s other software tools and with its Cube and Cell hardware accelerators.
Both 454 and Illumina offer some software that analyzes data from their instruments, and other bioinformatics vendors, such as DNAStar, have released products that handle data from currently available next-gen sequencers. However, Forsberg said that these tools “only cover a very small subset of applications” and offer “no integration with other bioinformatics functionality.”
CLC Bio is looking to create a platform that will allow researchers to “sequence a genome, assemble it, have a look at the assembly, and if you find, for example, spots with low coverage, you can have a closer look at that region and you can use the other functionality in the Workbench to design new sequencing primers, and go out and, for example, put a Sanger sequencing read across a region that was hard to resolve.”
While CLC Bio plans to release its reference genome assembly application first, Forsberg said that the “greatest potential” lies with the digital gene-expression tools it’s developing with Nielsen’s group. For one thing, he said, researchers prefer the digital signal that sequencers provide over the analog signal from microarrays. In addition, he noted, digital gene expression is of great interest to research groups studying organisms that have not yet been fully sequenced and for which microarrays have not yet been created.
Forsberg said that examples of where some vendors “see the [gene expression] market going” include Illumina’s purchase of Solexa and ABI’s decision to phase out its gene-expression microarrays in favor of the SOLiD instrument.
CLC Bio has not yet determined whether it will release the Seqnet tools as individual plug-ins for its Workbench platform or whether it will bundle them into a larger product. “That, in a sense, depends on the commercial interest, and it depends on how things work out in the market,” Forsberg said.
However, he noted that the company views the project as “a unique opportunity” to get input from a group of researchers using new sequencing technologies in a range of application areas ranging from sequencing soil samples to human health.
“Next-generation [sequencing] is a very big focus area for us,” he said, noting that the company expects strong demand for the tools it is developing under Seqnet.
“I think the vendors thought when they built these machines that they were going to be putting them in huge genome centers, but they’re now putting them in labs all over the world,” he said. “I think a lot of these machines are going out to places that are going to be relying on good software.”
— This article originally appeared in last week’s BioInform, a sister publication of In Sequence.