Name: Ralph Schlapbach
Position: Scientific coordinator and managing director, Functional Genomics Center Zurich of ETH Zurich and the University of Zurich, since 2002
Experience and education:
Group leader and head of microarray core facility, Max Planck Institute for Infection Biology, Berlin, 2000-2002
Postdoc, University Hospital Zurich, 1999-2000
PhD, ETH Zurich, 1998
Undergraduate degree in biochemistry, ETH Zurich, 1995
Founded 10 years ago by the University of Zurich and ETH Zurich in Switzerland, the Functional Genomics Center Zurich has supported the research community with a range of 'omics technologies, including next-generation sequencing, microarrays, and mass spectrometry.
The center, which primarily serves users of the two founding universities, is equipped with a host of next-gen sequencers from Illumina, Life Technologies, Roche/454, and Pacific Biosciences and plans to bring in new platforms before the end of the year. It currently has a staff of 40, an interdisciplinary team of experts with backgrounds in biology, biochemistry, molecular genetics, physics, chemistry, computer science, and mathematics.
Last week, In Sequence spoke with Ralph Schlapbach, the center's scientific coordinator and managing director, about the FGCZ's operations and its strategy for next-gen sequencing. Below is an edited version of the conversation.
How does the FGCZ operate?
The operational mode is not like a classical core facility, but the largest part of the center operates as a so-called user lab with local research groups from the two universities as the primary customers. They apply with research projects at the center, which are evaluated first on a technical basis and also on scientific merit. Once these projects are approved, the staff, mostly junior researchers at the PhD or postdoc level, come to the center and get training in using the instruments, whenever this is possible; in data analysis methods; bioinformatics; statistics. We try to do that as a collaborative effort, starting discussions with the research groups early on, including experimental design, sample prep issues, through to data generation, and also, as far as we can, with data interpretation. The biological or biomedical interpretation of the data has to be done by the research groups because we cannot have all the background necessary for that.
What kinds of research projects do your users come to you with? Do they include clinical research?
The largest fraction is basic research with very diverse backgrounds, ranging from agriculture research to ecology to model organisms and non-model organisms.
Clinical applications, so far, have come more from a basic clinical research area, it's no diagnostic applications so far. This is not something we emphasize; it's probably something that dedicated facilities or sub-groups will deal with rather than us as a central platform.
Do you also conduct research of your own?
About 20 percent of our total activity is methods development. Much of it is being done in close collaboration with the users because we don't have biology in house. Sometimes we try to develop a custom protocol for the investigation of biological research question that a group poses to us. Some of the developments are what we think is necessary for future applications, which is largely on the bioinformatics side, where we are probably more in a position to see what's necessary than the users.
Do you in any way compete with the universities' core facilities?
We try to strategically align our activities as far as possible. There is coordination at the level of the vice presidencies for research at both schools. As far as possible, there are no redundancies. We rather try to complement the portfolios, so if there are small groups or offices within departments that try to emphasize their specific needs or applications of their direct users, we try to do everything else.
How is the center equipped with next-generation sequencing platforms?
Currently, we have two SOLiD 5500xl, one HiSeq 2000, one Ion Torrent PGM, one 454 GS FLX, and one PacBio RS. Imminent future plans until the end of the year are to broaden the Ion Torrent platform with the Ion Proton, and very likely also to expand the capacity on the Illumina side with another HiSeq or maybe a MiSeq.
Where does the funding for these systems come from?
There is a mix of funding. Part of the infrastructure funding comes through services that we do on the systems. But the hardware largely comes through institutional funds.
Unlike most other places, you maintain a wide variety of sequencing platforms, rather than concentrating on one or two. Why did you decide to bring in so many different types?
This is largely due to the research groups that we support. Being responsible for the technology and methods support for two full-spectrum universities, we have literally everything that we try to support with these 'omics approaches, from biomedical or very applied clinical research to very basic research down to structural investigations.
In terms of applications, this means there is a huge variety in both project size and project heterogeneity, the depth of information that the different groups want to generate. Particularly in the first years of technologies — we saw this with microarrays, and we see it with next-gen sequencing, and we still see it with mass spectrometry-based approaches — as long as there is still huge development in the community, no standard solutions available but development efforts from many groups and companies, it's very advantageous to have multiple platforms in order to choose the optimal platform according to the needs of the research question, and not the other way around, which is a certain luxury issue. Most of the time, we have to make compromises, and we still have to, but we think that it's more efficient if we can support projects with a technology that ideally fits their needs in terms of the data, the cost, the timelines, et cetera. It's very likely that there will be a certain consolidation in the years to come; we saw this also with other technologies that have matured. But there is still a benefit in having multiple platforms.
How do you decide which platform to use for a certain project?
We normally talk with people first about their research needs. Most groups, when they submit their research project, actually suggest using one or the other platform, which is OK, but normally, we critically discuss this approach together with the group — whether this is really the ideal platform, whether there is a specific need for a certain platform. If they have already generated data on a certain platform, it is unlikely that we would switch, unless there is a huge performance increase to be gained. Also, there might be protocols only available for a certain platform; this would be a clear indication for just choosing one of the platforms.
Otherwise, we discuss what the group's needs are, and also project a bit what their future needs and potential applications will be, so if we choose a platform, we won't have to change it again very soon. To start such a discussion, we have a small logical decision tree, where we try to emphasize the strengths of the different platforms for different types of applications. In most cases, we would stick to that initial recommendation, maybe carry out a pilot study on multiple platforms and see which data set fits best, including all the bioinformatics analysis, which is often forgotten. And from such a pilot experiment, or from experience, we choose which platform to use in the future.
Are certain platforms more popular than others? Do all your instruments run at capacity?
There are always fluctuations on a monthly or annual basis. There are also certain new protocols that may push one or the other platform for a certain period of time, which levels out a bit. What we do see is that very high-throughput platforms, like the HiSeq for example, are very sought after, especially now that people have robust protocols available that allow for significant multiplexing. A lot of platforms that do not offer that, or to a lesser extent, are less well used.
What's astonishing is that one of the older platforms, the 454, is still very well used because it currently fills this niche that none of the other platforms has filled yet — longer read lengths on the order of 500 or 1,000 bases with significant throughput. Neither the PacBio, which generates larger fragments but less if them, nor the higher-throughput platforms, like SOLiD or HiSeq, currently fill this gap. It's something that probably will change with newer protocols on the MiSeq or the Ion Proton or the PGM. We will see a decrease in demand of the older platforms, but currently, there are still specialist niches for all the different platforms.
What has your experience been with the two newest platforms in your portfolio, the Ion Torrent PGM and the PacBio RS?
The PGM, for us, is kind of a trial platform, largely, for something high-throughput like the Ion Proton. Being a core facility, we try to combine and cluster projects to run them most efficiently on larger platforms. So while the primary market for the PGM is single institutes or larger research groups, we currently use it more for explorative work that would then tend to run a more high-throughput platform. We like the PGM very much for the short turnover time. In order to optimize protocols, to QC samples and libraries, and especially for educational purposes, we think the PGM is just perfect because one can do workshops and courses, and students within a few days can actually generate their own data, which is not possible with any other platform. But the productive aspects will mostly come with an Ion Proton.
The PacBio, on the other hand, from an operational mode and the application we currently use it for, is in kind of a split mode. One is more production-oriented, which is largely for very small genomes, like viruses, where with circular consensus sequencing the accuracy goes up and we can do very easy, quick, and cheap analyses. Another application that is kind of in production mode is hybrid assemblies, or scaffolding runs, for de novo sequencing projects that use largely HiSeq, or, in some cases, SOLiD data, that you supplement with the large reads, including large insert libraries for scaffolding and clustering into contigs.
We have recently started to work on DNA modification analysis. Probably like many PacBio users, we think this system will be the ideal platform to do these things, and it's probably also going to be its key application in the future, apart from the very long read applications that can be done. Analyzing different modifications for sure would be very attractive to many users, so we really see this as a key argument for the PacBio.
Can you comment on the customer service from the different vendors, and their ability to fix problems in a timely manner? Have you noticed any differences?
It's hard to say. It largely goes with size, but not necessarily in the same direction. Standard problems or issues are solved by the larger companies very efficiently, based on their experience, the large support force they have. Once we hit very specific or maybe even individual issues or problems, we do see a difference in that the very large organizations that have grown very significantly in recent times, for example Illumina and Ion Torrent, have more difficulties in keeping up with individual requests, because people are busy fixing standard issues and in so many places.
PacBio has been extremely helpful in solving individual issues or even coming up with customized solutions, discussing the customized use of the platform, which probably reflects the earlier stage of platform distribution, that they have more time and invest more time in individual customers than the bigger companies.
At what point do you decide to retire a sequencing platform? When do you expect that to happen for any of your current platforms?
We normally do it in a two-phase mode. First, we always try to get a successor platform that performs better or more cost-efficiently than the previous platform. So normally, we don't shut down workflows, so the user doesn't have an alternative, but we rather first search for the alternative and then start to shut down the other platform.
Once we know that a platform is being phased out, we would not initiate new projects with these technologies. We still try to keep access to these platforms through either other core facilities or commercial sources, so that we ensure that all the users can finish running work on a certain platform if there is a need for it before we phase them out.
Presumably the next platform that we would tend to shut down is the 454, once there is a follow-up technology, either from Roche/IBM or from Ion Torrent or Illumina when they have longer-read protocols for their systems. This is something that I would expect to happen in the next 12 to 18 months, depending on the development speed of the others.
On Life Tech's side, it's probably clear that there will be a shift from the SOLiD platform to the Ion Proton platform in the future, so that probably the SOLiDs will then be the next one that we would phase out, for the benefit of Ion Proton systems.
How do you decide to bring in a new type of sequencing platform?
We try to be as early as reasonable. We don't have the capacity to play with alpha versions of the systems, but there needs to be, short-term to at least mid-term, production use of these platforms possible, because otherwise we cannot dedicate so much capacity for development work. This is different from large centers like the Broad or the Sanger or so, which have these resources.
We try to get systems that are leaving the beta system stage and are getting into early production phase, so that we can be reasonably sure to operate them rather reliably, to start with the first productive applications, but still have the chance to maybe influence some of the developments going on with those platforms and, more importantly, to gather experience working with these platforms, and the company, and the data, so that once the protocols reach a certain maturity, we are all ready to use them in production. In our experience, for a really new technology, it easily takes a year before we are in a state that we can tell users that we confident running these applications on their behalf. So we try to be among the first getting a commercial platform.
Where do you see the greatest need in sequencing technology development — not only regarding the platforms but also the front-end sample preparation and the bioinformatics?
One of the biggest challenges is clearly lab automation and multiplexing of large numbers of samples. The capacity of the analytical systems is increasing so fast that having reproducible, robust lab protocols is becoming a limitation. Automation is one part, but in the end, automation will not solve everything. The protocols are still very complex, need a lot of QC, manual quality control steps. This is for sure something that's currently more throughput-limiting than the actual sequencing technology. If the whole lab workflow would become even more robust and easier and cheaper, than this would be an actual breakthrough.
In terms of data analysis, there is a reasonable set of tools available nowadays. But the big challenge is what tools to use for what purpose, in what configuration, with what parameters. There is also experience needed with different data types and tools on the market. Most of the vendors have by now understood that they cannot compete with the development pace of the community. They somehow surrendered to the community and let them do the job in terms of algorithm and software development, which is neat in terms of new functionalities and new ways of dealing with the data, but on the other hand, we don't only have research processes, we also have production duties. It's sometimes not scalable in terms of robustness of these tools, so we have to reimplement the software, recode some of the things connected to professional LIMS systems or data storage systems, and there is clearly not much support from the companies. There are also few alternatives in terms of commercial providers who are coming up with full-blown frameworks that could just be used out of the box.
What have you done so far in terms of sample-prep automation?
Due to the large heterogeneity of the samples and projects, automation has actually been decreased for the past few years. Back in the earlier days, we had more automation available, but with an increasing number of platforms and protocols, we went back to manual preparation. But the next step will be to re-automate these protocols, now that throughput on the different protocols increases again. This will only be true for the major throughput-oriented applications, not the ones that emphasize flexibility and methods development.
What is your solution for bioinformatics? Do you develop your own tools, or do you operate commercial software?
We use a mixed approach. In terms of data analysis tools and frameworks, we try to exploit whatever the community offers. In terms of lower-level data conversion or transformation efforts, we largely use the vendor tools because they understand the data formats best. The lower level is typically vendor driven, but medium- or higher-level analysis is largely done using the open source community. We do support a few commercial data analysis packages, largely for our users, because they do not want to program or use tools at the command level, but they want to have a graphical user interface and that's something that the open source community does not provide much. In terms of data management, we have a framework that we developed ourselves that deals with all the data and project information, user administration, et cetera.
Is there anything else you would like to mention?
The challenge to come up with complete solutions from the viewpoint of the user is something that we currently haven't achieved yet due to the high development speed of the technologies, the protocols, and the bioinformatics tools. The major challenge I see is currently really to come up with consistent, reliable workflows from beginning to end that the user can make sense of, because most users are just overwhelmed with the amount and complexity of the data. We have to provide some robust ways, so we can make sure that they get what they need without having to deal with all the nitty gritty details.