Interdisciplinary Center for Biotechnology Research, University of Florida
Name: William Farmerie
Title: Associate Director, Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, since 2006
Experience and Education:
Director Genomics Research, ICBR, University of Florida, 1998-2006
Scientific Director, Recombinant Protein Expression Core, ICBR, University of Florida, 1992-98
PhD, Biomedical Sciences, University of Tennessee-Oak Ridge, Oak Ridge National Laboratory, 1980
BS, Biology, Florida State University, Tallahassee, 1973
Core facility managers are getting together in Tampa, Fla., this week at the Association of Biomolecular Research Facilities’ annual meeting, where next-generation sequencing is on the agenda.
Bill Farmerie’s Interdisciplinary Center for Biotechnology Research at the University of Florida, Gainesville, is one of the few academic core facilities that has acquired a 454 Genome Sequencer, which it obtained through a “rather unique” university-wide resource-pooling effort.
Last week, In Sequence asked Farmerie about his experience with the GS 20 technology, which his lab, having recently upgraded to the GS FLX, has used for over a year now, and what advice he has for other cores that are considering a next-generation sequencing instrument.
When did you decide to acquire a 454 Genome Sequencer?
We became aware [of it] at the ABRF meeting in 2005. We had a fairly extensive Sanger sequencing operation [using] five [GE Healthcare] MegaBace instruments — four 1000s and one 4500. For a garden-variety core facility, we were doing a lot of Sanger sequencing.
We were a little bit lucky in the sense that 454 descended from CuraGen, [which] at one point in time had a satellite facility here, affiliated with the University of Florida. We knew some people involved in the development [and] we made a trip up to [454 in] Branford [,Conn.,] in April or May of 2005.
We felt pretty confident that this was the right thing to do. What we had seen as we increased our Sanger-based capacity is, as you drive down the price, people just do more of it. They will still spend the same amount of money on sequencing, but they get more sequence for it. This technology, if it meant you could sequence a bacterial genome for, say, $20,000 or less, you have made [genome sequencing] available to a whole group of people that could not do it before.
Did users request that you get the instrument?
Oh, no, definitely not. We were the ones that rang the bell. We saw what the technology was, believed very strongly that it was very workable, and then we just started selling the idea here on campus. I talked to the chairman of microbiology and cell science in the College of Agricultural and Life Sciences, Eric Triplett, and he said right away, ‘We should do this.’ He took up the campaign to generate funding inside the university to acquire the instrument. It’s $500,000, and, depending on what you already have, you could spend up to another $100,000 to equip the lab to [run] the technology.
Where did the funding come from?
This was rather unique. [Triplett] called other chairs, and other departments. So people chipped in, $25,000 here, $50,000 there, $10,000 from someone else. They got up to pushing into the $300,000’s in commitments. The big contributor was the College of Agricultural and Life Sciences, but also very significant contributions came from the College of Medicine. Then it was pretty easy to go to the vice president of research and say, ‘Look, we have got commitments towards two thirds of the cost of this instrument’ [and he payed for the difference.] ICBR paid for all of that other stuff beyond the instrument.
What else did you need besides the instrument?
Depending on what you are already equipped with, you need [things] like an Agilent Bioanalyzer, you need a [tissue lyser] for making emulsions, [and] you need some bead counters. Some of the stuff is not absolutely necessary, but if you are processing a lot of samples, it’s awfully convenient.
[To prevent contamination], it’s highly recommended that you set aside a dedicated room where you do the pre-PCR portions of the library construction. You don’t want equipment going back and forth between the [contaminated side of the] lab and the clean room, so you need to equip this with its own sets of pipetters, little microfuges, a refrigerator, and a freezer, and put in an externally vented hood. We probably spent $50,000 to $65,000, something like that. If you buy everything, if you pretend you have nothing, it probably costs you around $100,000.
When did you install the instrument, and when did you start operating it?
We took delivery on the instrument at the end of August of 2005. 454 had a very thorough training regimen. Everyone who was involved in the process — four of us — went to [454 in] Branford for a week and worked at their measurement facility, learning to do library construction, doing the whole thing, and seeing how they did it. That was really important, to just see how they set up. The usual training regimen is, you spend a week in Branford, and then they come here, and you do essentially everything on your side. By the time we had done both of our training sessions, we went operational around the 1st of November of 2005.
How many people operate the instrument?
One person handles the pre- and one the post-PCR side. The pre-PCR person more or less is a full-time person. We also do enough non-standard library preparations [such] that we are not just following the standardized protocol. The post-PCR person is just breaking the emulsions, purifying the beads, and putting them on the instrument. We have a very skilled person who manages that side, but the actual [people who] do it are graduate students from engineering, they work part-time for us. They have never had any molecular biology experience. It’s just attention to detail [that’s required.]
How much do you charge for using the GS 20?
For the library construction and titration, [we charge] about $4,200 to $4,300, [and for] a production run on the GS 20 about $7,000. If you want to do subsequent full production runs, that’s an additional $7,000. That’s pretty close to our cost.
I feel like, in many cases, the cost was small enough that [initially, investigators] were able to pull it out of non-committed resources. When the instrument first got here, they did not have a grant that specifically said they were going to spend money on 454 sequencing, because they could not possibly have known that.
I got some feedback from investigators who proposed a lot of 454 sequencing very early in the game, and I think some of the funding agencies, I am thinking of [the National Science Foundation], were a little skittish. I think now, [the technology] stood the test of time, so larger projects are having money specifically ear-marked [for 454 sequencing].
Did you ever find yourselves competing with 454’s service center?
Initially, they recognized that in order for this technology to catch on, they needed to have real projects be done, and people talking about it in meetings. I think that they were doing work for less than those of us on the outside could do it. After all, they were making the kits. I was not the only one that mentioned that fact to them. Because I had investigators that kicked in money to buy our instrument who were hearing from colleagues that they were actually having things done at 454 for less than we could do it. I’d say, ‘I know what I pay for my kits. Unless I’m going to lose money on everything I do, I can’t match that price.’ My understanding is that eventually that did get rectified, and that the cost of having things done at 454 has increased.
Also, I know that some people from outside the university that have contracted with us to do runs on our instrument came to us because our queue was much shorter than [454’s] was. They seem to be pretty busy.
Who has access to the instrument, and how well has it been accepted?
It’s highly accepted. We have done well over 100 runs on the instrument. It’s a publicly available service. Easily the bulk of what we do, 80 to 85 percent, is on-campus work, and the rest of it comes from various places off campus. The machine is pretty heavily used. A lot of projects right now were stalled because everybody knew we were getting the [GS] FLX upgrade on the instrument. They knew the longer reads and the [higher] yield would benefit their projects. I helped bring [the new FLX] upstairs yesterday from the loading dock. I am guessing that somewhere next week we will be operational on the FLX instrument. [But] we have continued to be busy on the GS 20, because there are certainly things that actually, you can do better on the GS 20 than on the GS FLX.
Where does each instrument have its strength?
Where length matters, as you are assembling something, every nucleotide you can get is money in the bank. But if what you are doing is just ID-ing things, you need enough unique sequence that you can get a Blast hit. Or if you are just counting; the application where this really matters is things like transcriptome analysis. In that case, if you have got a 100-nucleotide sequence, which you get off the GS 20, you can pretty much unambiguously assign [it], you don’t need 200 to do it. The FLX will give you a longer read, but at a cost, because you are doing more cycles, you are burning more reagents, and it’s more expensive.
Isn’t it possible to tune the FLX down, to do fewer cycles?
Since we are an early adaptor of the GS 20, we are earlier in the acquisition of the FLX. There will be kits available to run smaller plates and do fewer cycles on the FLX, but they are not available at this time.
What have been the most important applications for your instrument so far?
There was exactly one written application for the instrument when we acquired it, and that was bacterial genome sequencing. The things we were asked to do on the instrument was virtually everything but that, at least initially. The bacterial genomic work has picked up quite a bit. Between mycoplasma and bacteria, I suppose we have done more than 12 different ones on the instrument. We have [also] done a lot of whole-transcriptome sequencing from normalized cDNA preparations. We have done ultra-deep sequencing on 16S ribosomal DNA [as well].
How important is the bioinformatics and data-handling end of it?
I’d like to think that because we were handling a lot of Sanger sequencing, we were not naïve about handling the data. We already had a pretty good bioinformatics pipeline in place for doing assemblies, doing Blasting, annotating sequence information, and delivering it back out to clients.
It’s one thing to [run the instrument], but it is another thing when you are managing 25 to 30, maybe 40 different individual customers that have done these projects. When you have that many customers, delivering the output can really turn into a headache. But we had developed some web-based tools for delivering Sanger sequence, and it was a pretty easy transition. We just had to focus more on the genome scale. So we had to develop some visualization tools, and web-based tools, so that groups of investigators who were contracting for these genomes [could] work collaboratively on the annotation of their genome sequences, independent of us.
We realized when we were ramping up our Sanger sequencing operation that if we just dumped sequence on [faculty investigators], the whole [research] process would die. You want them to use their intellect, their training. They don’t do bioinformatics, they don’t worry about sequence assembly, they don’t want to worry about setting up some operation for doing large-scale Blasting and then parse that information and get that onto a display, so they can look at the information.
If we did not help them get their publications out and provide them with data in a format that can go into a grant proposal, the funding circle would be interrupted, and nobody would ever do another project. [So] we started building a new informatics group in 1999 or 2000 just to do these things. We started small, but we have four full-time PhD-level bioinformaticists. And I know we are small by comparison to a lot of places.
But computationally, how many CPU cycles do we dedicate to 454 now? I’d be surprised if it wasn’t 70 or 80 percent. We have a 28-processor cluster do to Blasting on. And it’s adequate, but I know I am teetering on the edge of inadequacy. While we can get along right now, and nobody has complained we are not getting data out fast enough, I can see how the FLX instrument generating that much more data is just going to take another bite out of my CPU time. And it’s not just about Blasting, there is plenty of other stuff.
[With new instruments like Illumina’s Genetic Analyzer, or ABI’s SOLiD], instead of having to deal with 100 million bases at a time, we are going to deal with data in gigabase chunks. And we are going to be doing that by this time next year. And we better be as proactive as we possibly can about figuring out how we are going to deal with that. That’s why I am glad we got into this eight or nine years ago, so each one of these things isn’t quite a shock.
Are you planning to obtain an Illumina Genetic Analyzer or an ABI SOLiD?
I sure do hope so. I have an NSF major equipment grant application in to acquire a next-generation machine. I don’t have any idea whether we are going to be successful. That request was both for computation and for a next-generation machine. You cannot ask for one without asking for the other.
Have you decided which one you would buy – Illumina’s or ABI’s?
I have not made up my mind about which way to go if I could do that. If I could say I was leaning in one direction, it would be towards the ABI, only because of history. They are ABI. And they have been doing sequencing for a long time, and they have very deep pockets. I think they will make their instrument be very successful. And their technology has some interesting [features] — this notion of sequencing by ligation is really very interesting. The fact that you can sequence either forward or backward, and you can reset this thing simply by denaturing everything, there is something very appealing about that.
I know I don’t have to commit to anything now but I worry about determining which way I go when I actually have to make a decision, and I am going to be watching very closely how the instruments are maturing, and then we will decide.
Do you have any advice for other cores that are only now looking into a next-generation sequencer for the first time?
Without question, computation is important. If you want to add value to sequence, you have to get computation. And asking the individual investigators to do it is perhaps asking too much. If you want to create a firestorm, just get a sequencer, and don’t have any computation, and believe me, you will generate a train wreck, and force someone to solve the problem, because you are going to have more data than you can possibly imagine. You need people who are computationally savvy and focused on information management, or genetic information management.
Otherwise, if you are a decent molecular biologist, doing the molecular biology for these instruments is not tough. It was a heck of a lot tougher to develop the infrastructure that we needed to do Sanger sequencing on a large scale. Generating the library, not to excessively trivialize it, it’s just ordinary attention to detail, it’s basic molecular biology, and 454’s kits are well thought-out. The instructions are fool-proof. On the data generation side, there is nothing to fear. It’s all on the computational side.
Generally speaking, we are biologists, we are not trained to deal with computation. But you need storage on a scale that you can’t imagine. You will generate terabytes of data in a heartbeat. And these next-next-generation technologies [will] generate terabytes of data per run.