BOSTON – Intel officials said this week that they hope to build an “ecosystem” of partners who will use programmable hardware to create a standardized approach for analyzing data from second-generation sequencing instruments.
Speaking at a workshop on second-generation sequencing data management ahead of this week’s Bio-IT World Conference, Wilford Pinfold, general manager of integrated analytic solutions at Intel, said that the company has identified genomics as a key application for a technology it has developed that can more tightly integrate FPGAs and ASICs with the Intel platform.
With this system, the programmable chips are directly connected to the front side bus of Intel’s system so that they share memory with the general-purpose processor.
Pinfold outlined Intel’s interest in this market for BioInform last month [BioInform 03-21-08], but this week shed additional light on where the company believes it can help researchers struggling with the tremendous data outputs from next-gen sequencers.
“Primary data analysis seems to be where Intel can play the most useful role” in the field, he said, referring to the initial analytical steps in sequencing: image processing, base calling, and alignment and assembly.
Through discussions with some initial potential partners, Pinfold said that Intel estimates it takes a typical lab around six months to design, purchase, and validate the IT systems that are required to support a next-generation sequencer. Since the field is still relatively new, such labs have very little guidance – particularly those that have little or no IT support staff.
Pinfold said that Intel envisions building an FPGA-based “appliance” that researchers can purchase alongside a sequencer that will eliminate that six-month planning stage.
One key element of this approach, he said, is that the system would offer a standard for the community, so that developers could write code that could easily be shared among users with the same platform. Most labs using clusters currently can’t share their code very easily because they are set up so differently, he said. This aspect of the approach is particularly important in a field like next-generation sequencing, where all the algorithms continue to evolve very quickly.
One downside to this scenario, however, is that these codes would need to be written for FPGA-based systems – a very specialized programming skill that many bioinformatics developers don’t have. Pinfold acknowledged this challenge and noted that it could present an opportunity for companies that develop FPGA-based algorithms and software-development toolkits.
“We suspect there will be a lot of business for those guys,” he said.
Selling computers “is not a focus for us from a margin standpoint. We put the computer on board in order to facilitate the use of the system.”
Officials from one such firm, Mitrionics, were present at the workshop. Michael Calise, executive vice president and US general manager, told BioInform that the company has spoken to Intel about its plans for this market but could not elaborate.
Intel’s Pinfold was purposefully light on specifics, noting that the company is still in the information-gathering stages of the project. He did stress that Intel doesn’t plan to create the solution entirely on its own, “but will work with the community to make it happen.”
In line with this, he said that the company has partnered with the Wellcome Trust Sanger Institute to create a portal called Genographia, which is a discussion board for a range of next-generation sequencing issues, including informatics support.
Anne Chapman, senior marketing manager for genomics at Intel, was also tightlipped about the specific hardware and software components that might make up the proposed appliance, but did provide a rough timeline for the project. She told BioInform that the company is aiming to have a set of partners and collaborators identified by the end of the second quarter, some initial results by the end of the third quarter, and an available system by the end of the year.
Vendors Make Do
In the meantime, manufacturers of next-generation sequencing instruments have found that they must provide a certain amount of computational power with their systems in order to perform primary analysis. At the workshop, representatives from Applied Biosystems, Illumina, and Helicos discussed computing platforms that they are shipping with their sequencers as so-called on-machine or “on-rig” systems.
While ABI and Helicos combined compute systems with the first versions of their instruments, the first version of Illumina’s Genome Analyzer did not offer any support for primary analysis. This required researchers to transfer off the instrument all sequencing data for analysis, a step that added a considerable amount of time to sequencing experiments.
Admitting that the company was “caught with our pants down” when it initially shipped the Genome Analyzer, Abizar Lakdawalla, senior product manager at Illumina, said that the company has addressed this issue with a new module called Integrated Primary Analysis and Reporting, or IPAR, which provides real-time quality control and online processing of primary data during sequencing runs.
IPAR is available with the company’s Genome Analyzer II, as “a standalone box,” or integrated with a research center’s in-house architecture, Lakdwalla said. The system, which runs Windows XP, includes a four-core HP ProLiant DL380 server and 3 terabytes of storage.
IPAR currently evaluates the performance of a sequencing run in real time and performs image analysis. Lakdwalla said Illumina will release an upgrade in June that will enable base calling.
Some users welcomed the change. Richard McCombie, a professor at Cold Spring Harbor Laboratory who oversees a lab running eight Genome Analyzers, said that his group initially had a number of informatics issues related to the instruments, and that it took around 24 hours to transfer data from a run and then another week to analyze it. However, he said that “many of these problems have been solved” due to improvements from Illumina as well as internal workflow procedures that his lab had developed.
The on-rig approach isn’t perfect, though. Matthew Trunnell, group leader in the Broad Institute’s Application and Production Support Group, said that the Broad has been working with the ABI SOLiD platform for several months and has found that alignment is “too big of a job for the on-machine system” for large genomes.
He added that while the SOLiD software allows researchers to monitor the experimental status of a single instrument, it would be helpful to have the option to monitor the status of multiple machines at once.
He said that the institute is “still assessing” how much hardware it will have to add to the SOLiD’s own computational system.
In a discussion panel during the workshop, instrument vendor representatives all agreed that they’d prefer to stay out of customers’ IT-purchasing decisions, but decided to add computers to their instruments in order to help users.
Selling computers “is not a focus for us from a margin standpoint,” said Kevin McKernan, senior director of scientific operations at ABI. “We put the computer on board in order to facilitate the use of the system.” He added that since there were no tools off the shelf to do that, ABI decided its best option was to provide the IT as part of the system.
Likewise, all the vendors stressed that they are not looking to control the downstream analysis of the data that comes off the instruments. All four companies said that the source code for their software is freely available to customers.