Richard Reinhardt, department head, automation and service unit, Max Planck Institute for Molecular Genetics.
As the head of the automation and service unit for the Max Planck Institute for Molecular Genetics, Richard Reinhardt has had a great deal of experience with Sanger sequencing as well as the 454 system. Late last year, the MPI Molecular Genetics group added to its sequencing capacity by installing an Illumina Genome Analyzer [In Sequence 01-02-07].
In Sequence spoke to Reinhardt via phone recently to get his first impressions of the new technology and the studies it is enabling at MPI Molecular Genetics.
You have had an Illumina sequencer since the end of last year. What has your experience with the system been?
The experience with the system is quite different. It’s not just an add-on like another [Applied Biosystems] 3730 sequencer or another 454 sequencer. It’s a different technology. It’s completely different in [terms of] just handling the [amount of] data, the raw data. This is really … a new generation of sequencer.
Keep in mind that by just doing one run with 25- to 27-base reads for a Solexa system, this will generate 0.6 terabytes of data. So just to copy these data from one disk to another, using a USB2 interface, which is not the fastest, onto another system, takes 10 to 12 hours. Definitely, there is no chance to move the dataset within the network … so you really have to make completely new plans to integrate such a system into your facility. You have to define new computer domains, where you have the system running, you have to have a separate server system to access the raw data and transfer it.
The next thing you should also think about is to have a data mirror on your server. If something happens, you have to make a back-up, [but] it will take hours or days to store the data.
[As for] the processing itself, on a quad system, like a quad Opteron system, it will take between one and two hours, [and ] can be sped up using not four microprocessors, but more. This is not a bottleneck. Still, [even after] having the data processed, you have to work with quite large files. They are still in the range of 50 gigabytes. Of course you can handle it, but it's not [easy] to shift it around quite often.
Any other [sequencing] system is easy to incorporate [into your IT infrastructure]. Here, you have to make a new plan, be redundant. You have to form specific domains, not only in cases [where] you interfere with activities of other users, but to not [have other users] interfere with this domain.
This is really something different from running a 454 system. We have also a 454 system here, which [was] easily incorporated [into our environment]. It produces less data, and you do not need such a high redundancy [when you sequence]. With 25 bases, you need a much higher redundancy — 40X, 80X, 100X coverage to sequence a genome and create an assembly.
What projects have you used the Illumina platform for?
First of all, it took quite a long time to get experienced, [to] install the system, to find the bottlenecks of the system. At present, we have several projects going.
In one, we are trying to check genome-wide for SNPs in mouse and rat. The genomic sequence for mouse and rat is known, [so] now you can look for SNPs.
Nevertheless, in those cases, you need more than one run. We didn’t finish that project. [We have made] 10 runs so far and we obtained a good representation of SNPs to extract the most powerful and usable SNPs for characterizing various individuals.
Another project, a very new one — we are just presently in the starting phase — is we are looking for very short RNAs, microRNAs. Illumina just published the protocols for microRNA analysis on the Solexa system and this looks very promising for us. Nevertheless, we have to introduce coded adaptors [for this application], so you can mix more than one sample on one run, up to five samples right now, for our application.
>From a total run, you get in the range of 1 gigabase, with 25-base-pair reads. At the moment, we run with 27-base-pair reads. On the flow cell, you have eight channels, and within each of these channels, you generate about 125 megabases of information. This is more or less overkill for looking for microRNAs. In that case, we have experience, [because] we have also done this on the 454 system. We are quite happy with data in the range of 15 to 25 megabases per sample. So we have done the same trick that we have used on the 454 system, [and] we have introduced coded adaptors.
They do not need to be that long if you mix five samples. You just need four bases for coding. That’s, of course, information you will lose from the read, [so] the total read length is reduced from 27 to 23, but I still believe this is a powerful method. We have not compared this yet with standard capillary sequencing or 454 sequencing, but from the first [results, it appears that] this might be [a] really powerful project.
The other thing what we have done so far is expression profiling. We have been generating cDNA and just looked for the content in various tissues. This is normally done on microarrays, [which] you can get for a reasonable price in human, mouse, and rat, where there are prefabricated microarrays available. For instance, the Illumina BeadStation is very powerful [for that], or you can use arrays from Affymetrix.
But if you … go to other model organisms where you don’t have prefabricated microarrays ... it’s a good idea to do the profiling of various tissues [on the Solexa platform]. It’s not that you only detect which ESTs are there but you can also count them. The method is not only qualitative but also quantitative.
And of course — but this is nothing new — we [also] use the Solexa system for standard sequencing: genomic sequencing, for instance, looking at BACs. That was the first thing we really did. I think, honestly, this system makes, from our experience, [the most] sense for whole genomes.
How do the 454 and the Illumina platforms compare?
The greatest difference is the cost — how [much] you have to invest for generating 1 gigabase on the Solexa system and 1 gigabase on the 454 system. To make a long story short, it is definitely more expensive to work on the 454 system, even [though] you do not need the same high coverage as on the Solexa system because of the longer reads.
With the old [GS20] system, the read length was 100; on the new FLX system, it's 250 bases. This definitely reduces the necessary coverage, but generally, 454 is four to 10 times more expensive [than the Solexa instrument].
Roche and 454 have recognized this as being a bottleneck for others using the system [and] the rumor is that the price policy will be changed. I hope that will be the case. If the 454 system would come down to the price level of the Solexa system, I believe it would much easier to work with the 454 system. Also, [it would help] if they could increase the throughput. At the moment, with the upgraded FLX system, it’s in the range of 100 megabases to 150 megabases. If they increase the read length to 500 base pairs in 2008, that will double the output, which would increase the possibilities for the system.
Overall, the 454 system is much easier to use [than the Illumina sequencer].
Where do you think next-generation sequencing is going?
At the moment, it’s mainly the ABI SOLiD [that] has started an early-access program. Of course, we are [in discussions] with them about such a system. There is a lot of interest there on both sides but we have to check whether we can meet their deadlines [for the early access program]. We are interested in that. Also, [we have] a long lasting cooperation with ABI.
How do you think the Illumina and SOLiD platforms will differ mainly? They seem to be quite similar in many regards.
In many aspects, they look alike. They both have ultrashort sequencing read lengths of 25 to 35 bases. [Illumina’s] will increase in 2008 to up to 50 bases. I have seen two major differences. One is the capacity. Even at the end of October last year, when I was in Beverly [Mass.] to see the first samples of the SOLiD machine, it was clear that they were working with two flow cells in parallel, while Solexa has only one flow cell. As a result, I believe the SOLiD will have twice the throughput. This is a difference, but this is not a remarkable, dramatic difference in my opinion.
What is more interesting is that they have developed a very nice error-correction method. This is really a remarkable difference, especially for medically based resequencing projects. When you are looking for SNPs or mutations, this gives you a much higher confidence level that this is a real mutation and not a sequencing error. This is a remarkable difference, plus having twice [the] throughput. In other regards, they are not much different. For the SOLiD system, it may be an advantage having this high-confidence level in SNP and mutation detection and could be an interesting feature for clinical projects and applications.
Regarding other systems, it's too early [to talk about at the moment].
Is there anything else you would like to mention?
First of all, I do hope that we get some more flexibility in the price policy of the companies, especially regarding the 454 platform. The [patent for the] pyrosequencing [technology] used in the 454 system was [granted] to Mathias Uhlen [of the Royal Institute of Technology, Stockholm] in 1988. This will soon be 20 years [old], so the patent will be running out. There should be more flexibility by other companies entering this market. I do hope so.
For the ultrashort sequencing, I hope [that] having three companies [Illumina, ABI, and Helicos] in the same area will lead to decreasing prices. This is exactly what we need.