Director, Center for Infection and Immunity
Name: Ian Lipkin
Title: Director, Center for Infection and Immunity; Professor of epidemiology, neurology, and pathology, Mailman School of Public Health and College of Physicians and Surgeons, Columbia University, since 2002; Scientific director, Northeast Biodefense Center, since 2002
Experience and Education:
- Professor of neurology, anatomy and neurobiology, microbiology and molecular genetics, University of California, Irvine, 1990-2002
- Fellow, Scripps Research Institute, 1984-90
- Resident in neurology, University of California, San Francisco, 1981-84
- Resident in internal medicine, University of Washington, 1979-81
- Internship in medicine, University of Pittsburgh, 1978-79
- MD, Rush Medical College, Chicago, 1978
- BA in cultural anthropology, philosophy, and literature, Sarah Lawrence College, 1974
Ian Lipkin has a track record in the molecular study of pathogens. In 1989, he was the first to identify a new microbe – Bornavirus – that causes Borna disease, a neurological illness first described in horses and sheep, using subtractive cloning.
Ten years later, Lipkin led a team of researchers who identified West Nile virus in the brains of encephalitis victims in New York. In 2003, his lab sequenced a portion of the severe acute respiratory virus directly from lung tissue and built a sensitive assay for infection with the virus.
Last week, a group of researchers headed by Lipkin reported using 454’s sequencing technology to characterize microbes in honeybees affected by colony collapse disorder. Their study appears in the current issue of Science.
In Sequence talked to Lipkin this week about the role of next-generation sequencing technologies in studying microorganisms linked to disease.
Tell me about your recent Science paper on honeybee microbes and colony collapse disorder. How did that study come about, and what role did 454’s sequencing technology play in this?
The initial contact was made in December of 2006. Shortly after I gave a talk at the Institute of Medicine [of the National Academies], Diana Cox-Foster, [a professor of entomology at Penn State] who is a member of a working group on colony collapse disorder, or CCD, contacted me to request help with identifying viruses and other pathogens, using chips that we had built for which we have an extensive database and a printing arrangement with Agilent Technologies.
Given that we don’t have much expertise in insect pathogens, and because that particular set of arrays is really designed for looking at vertebrate pathogens, I recommended that instead we move to an unbiased high-throughput sequencing platform, i.e. 454. I have been involved with 454 now for three years. Prior to their acquisition by Roche, I was one of two infectious disease people on their scientific advisory board.
I had been impressed with the way in which the sequence reads were becoming longer and more reliable. We had already had some success looking at outbreaks of acute disease in humans and in animals — that’s work that will be coming out shortly — so CCD seemed like a natural extension.
The fact that the honeybee, or Apis mellifera, genome had been sequenced meant that the algorithms that we use to decide what’s host vs. what’s non-host were straightforward to implement because we could simply subtract the Apis sequences.
We examined RNA extracted from CCD hives and non-CCD hives, as well as royal jelly. The first thing we found is that a wide range of microbes are present in both CCD and non-CCDs, and that the diversity in microflora is similar in Apis mellifera around the world. However, we also observed a trend towards increased abundance of one of the Gammaproteobacterial taxa in the CCD bees.
We also looked for pathogens that others had proposed were likely to be unique to CCD. They were not significantly associated with CCD, but we found a virus that was — IAPV, Israeli actue paralysis virus. There are differences in the sequence between the virus we found and the one that has been originally reported out of Israel. It’s premature to talk about that now, but it may well explain differences in the phenotypes of bees infected with IAPV here vs. in other parts of the world.
At present, IAPV is a marker. We don’t have proof that this is the causative agent. It may be, and this is something we need to explore. However, we have not said that, despite media reports to the contrary.
I want to stress that the approach we took here provides an excellent roadmap for how to approach challenges in pathogen surveillance and discovery in outbreaks of infectious disease. This technology allows you to simultaneously survey for the presence of a wide variety of pathogens across the tree of life.
The reason we chose to examine RNA instead of DNA is that ribosomal RNA is well characterized for bacteria and for a variety of other species. Also, if we used DNA alone, we would not be able to get at RNA viruses. We developed algorithms that allow us to subtract any of the sequences that might have been used to generate the amplification products, assemble contiguous sequences, subtract the host sequences, and then finally to do an analysis by Blast at the nucleotide and the protein levels to identify either relationships to known infectious agents or to recognize and appreciate sequences as novel.
What are the pros and cons of using sequencing over other molecular techniques, like microarrays?
Microarray experiments can indicate binding events to related microorganisms. However, unless you use specific short oligonucleotide arrays that discriminate and speciate and subspeciate, you are not going to know precisely what taxon you find. Problem number two is, if the agent is only very distantly related to printed probes, you may not see it at all. Problem number three is, there is no comprehensive pan-microbial array for all hosts. We have built, to my knowledge, the only pan-microbial array that will detect all bacteria, all viruses, all fungi, and all parasites in vertebrates. However, the density of that array is already approaching the limits of what we can print. We could not possibly print a single array that would allow you to address all insect species as well. We can print subsets, for example a viral array that covers all known viruses, or an array that covers all prokaryotes. But comprehensive coverage would require a large number of arrays. And then at the end of the day, you would not know specifically what the hybridization meant, unless you had extremely dense arrays that addressed sequence changes that could reveal differences in phylogeny.
In contrast, high-throughput sequencing technology allows us to rapidly obtain definitive information for any agent that is represented in either the protein database or the nucleotide database.
In each of these instances, whether we use microarrays or sequencing, it’s important to stress that all we are collecting is a qualitative snapshot of what’s present in the sample. We then need to go back and test the validity of the results, quantitate burden, and obtain longer sequences that can be used to differentiate agents. In our paper, although we found a virus that clearly looked like IAPV, we then went ahead to clone and sequence, using standard technologies, longer segments where we could get a more precise view of its evolutionary history and relationship to other strains.
What we have done in another application that’s now in press is to rapidly sequence an entirely new genome. In that instance, we first propagated the agent in cell culture, collected the purified RNA out of the supernatant, and analyzed it by pyrosequencing.
When we work with clinical materials, we prefer to obtain samples from acellular compartments. In the honeybee example, we were grinding up whole bees, so a lot of the sequence is invested in looking at the genome of the bees, as opposed to simply looking only at the pathogen. In contrast, if you analyze human serum or plasma or spinal fluid or urine, you can rapidly get large amounts of sequence data that are specific for the microbe.
Others have tried to clone smaller cDNAs and characterize them with traditional sequencing, but that approach is tedious, and frankly, it’s not capable of generating the abundance of sequence data needed to rapidly identify new pathogens. That’s not to say that 454 is the last word. Many people will be trying to approach pathogen discovery problems using other platforms. But right now, we have been quite successful with 454.
What about other platforms?
We are not wedded to a platform. Our focus is answering questions about infectious diseases and supporting our colleagues within the World Health Organization who must address outbreaks. To save time and resources, we deploy a staged strategy for surveillance and discovery.
We begin with simple multiplex methods, with MassTag PCR. If this fails, we proceed to microarrays, and then if those fail, we move on to high-throughput sequencing. Because MassTag PCR is done in a few hours, you can do 20 to 30 different pathogens in one run; it’s extremely sensitive and quite inexpensive.
The arrays cost an order of magnitude more, and they take a day instead of a few hours. And high-throughput sequencing is yet another order of magnitude in expense and needs several days.
We cannot use high-throughput sequencing for everything. We reserve it for those situations where either the other methods fail, or we need to characterize all microflora, and cannot restrict our search to a group of specific agents.
In what other studies have you used the 454 technology?
We have used it to investigate respiratory disease, encephalitis, meningitis, hemorrhagic fever, and unknown viruses growing in tissue culture and transplant-associated disease. We have several studies that will be coming out shortly.
Do you have a 454 instrument in-house at Columbia?
No we don’t. We work directly with 454. It’s working beautifully as a collaboration because they have a spectacular team.
Are you thinking of acquiring a next-generation sequencer sometime in the future?
I am sure we shall. Right now, we prefer to work on the R&D with the team at 454. With Michael Egholm [at 454] and others, we have been helping to develop new algorithms that can be used to analyze sequence data for diagnostics and surveillance. 454 really had not been used for discovering new pathogens, so we specifically developed a bioinformatic pipeline to facilitate that and streamline the process.
Additionally, we have been working on methods for sample preparation that are designed to facilitate amplification of nucleic acids that can be sequenced directly. At present, we first have to ligate amplification products onto adaptors. We have developed an alternative strategy that should be more efficient.
Do you believe next-generation sequencing will become a routine tool in the future to characterize pathogens?
Yes, I do. It will become a powerful tool, not only for the discovery of new pathogens but also for characterization of known pathogens and for investigation of outbreaks of infectious disease.
Another application where high-throughput sequencing will be very helpful will be in understanding the role of microflora in normal development and in chronic disease. Increasingly, we are beginning to appreciate that complex changes in microflora may have long-term effects on physiology, such as endocrine, metabolic, cardiovascular, and central nervous system disorders. The models that are beginning to evolve will require high-throughput sequencing. We are not going to be able to do this using other platforms.