Among all the microbes living in the human gut, viruses also lurk. The University of Pennsylvania School of Medicine's Frederic Bushman and his colleagues characterized that fraction — the virome — in a PNAS paper published online in late February. He and his team examined sequence variability in DNA viruses in the human gut and described the regions of hypervariability they found. Genome Technology's Ciara Curtin recently spoke with Bushman about the virome. What follows is an excerpt of their conversation, edited for space.
Genome Technology: Is there an estimate of how many viruses there are in the human gut?
Frederic Bushman: No, it's not really clear. Think of a rank abundance curve where the first column on the left shows the most abundant; and the next one, the next most abundant; and the next one, the next most abundant; and the Y-axis is the proportion of all viruses. As you go out, out, out, you get lower, lower, lower. But it continues a long way. In other words, there are rare viruses in the gut and common viruses in the gut. And even with very large sequence samples, you are still getting some evidence for more rare groups. We can make some kinds of estimates, but their accuracy is fairly questionable.
GT: For your PNAS study, why did you focus on the virome aspect of the human gut microbiome?
FB: We have other studies going in the lab on bacterial populations, and fungal populations, and shotgun metagenomics. This was a specific look at the viral piece. If you look at everything, which you can do, you can take DNA from poop and extract and fragment it up and sequence — and we've done that and lots of others have done that — and you can see sequences that look like bacteria and you can see sequences that look like viruses. But bacterial viruses can often integrate into bacteria cell DNA and the bacterial viruses' sequences will often resemble those of their host. There's a very energetic exchange of genes back and forth between bacterial viruses and their bacterial hosts. So it's hard to be sure in a giant sequence mishmash which is virus and which is host.
What we did in the [PNAS] study was [take] highly purified viral particles, which we know are viral particles due to physical methods used for capture. That is, we solubilize poop, several filtration steps, see gradient banding, pull out the fraction that has the density of viral particles, then treat with nuclease that removes any free DNA, then break open viral particles and get out the DNA that was inside. So, because of the fractionation used and a bunch of control experiments, we have strong evidence that what we've got really, really, really are viral particles. That allows us to ask questions about viral populations selectively. Part of the reason for wanting to do that is that they have a lot of unique properties. As [Penn's] Sam Minot showed in his PNAS paper, we can find small regions of extremely high rates of mutation that are associated with reverse transcriptase, and seem to be produced by a really unique mechanism.
GT: How do these variable regions help the viruses? Do they help them evade detection?
FB: Evading detection is one candidate model, but probably not the most likely. In HIV, for example, sequence variation happens at a high level and that's associated with evading immune response, probably not true in phages. The reason I say that is wonderful work from Jeff Miller at UCLA — he has studied one of these systems in great detail and really figured it out beautifully. There's a phage that infects Bordetella called BPP-1 that Miller studied, and Bordetella undergoes phase variation. It changes surface proteins periodically, evading immune responses among other things. Now, the phage changes along with it. The phage has to bind to these surface proteins to get into the cell, and surface proteins are changing — the phage is -hypervariabilizing the gene encoding the protein at the very tips of its tail fibers that bind to the receptor, so that allows the phage to change its tropism. The way it works, there's a gene for the tip of the tail fiber called MTD, the major tropism determinant. There's a hypervariable region there, and the hypervariable region is duplicated nearby, a sequence near it that is the same. And then there's the reverse transcriptase gene. The template copy — the additional copy — is transcribed and then reverse transcribed in a highly error-prone fashion and the copies are absorbed into the MTD locus, and that's how you achieve the hypervariation at that target site. Miller has done a great job showing how this works.
We found lots of these kinds of systems. Because we had extremely deep sequencing data … we can see these types of variable regions and we see ones that are like Miller's, so they look like the MTD gene and phage BPP-1. But we see others that are completely different, where it looks like Ig-fold proteins, for example, are encoded in some of the genes that are hypervariablized, which is really cool finding because we are seeing the same protein scaffold, the seven-strand, all-beta, -Ig-beta sandwich kind of fold being hypervariablized by phage. And it's exactly the fold that is hypervariablized in antibodies in the T-cell receptor in the immune system. Convergent evolution has arrived on the same solution twice — how to display amino acid sequences that are getting hypervariablized at an extremely high rate. Probably something about the all-beta protein fold makes it really, really stable so that you can still maintain that fold while changing the amino acid sequence. It's really cool. There's all these new kinds of genes now that seem to be undergoing targeted hypervariation in a reverse transciptase-dependent fashion.
Some of these Ig-fold proteins, we don't really know what they are doing, but there's one that has been studied carefully called T4-Hoc. This is an Ig-fold protein in phage that is not hypervariablized, but at least there is some indication of what its function is. It's a dispensable head protein. It binds to six-fold symmetric vertices on bacteriophage heads, so that there are many, many copies of this protein on the outside of bacteriophage heads — think of little lunar landers and lots and lots of these proteins decorating the icosa-hedral head part. The outer part of these proteins are changing and they can change to be more negative charge, more positive charge, hydrophobic, hydrophilic, and the avidity effects will be huge because there are so many copies on the phage head. The phage is evolving — according to this model, if it is right — to be able to bind lots of different things. What those things are could differ in different biological settings. It could be that the phage is binding to cells that it wants to infect. That's probably part of it. But also inside the human body versus outside the human body, maybe there's different things it has to bind to. Maybe in the environment it binds to a passing kelp leaf or a mineral surface or whatever. Maybe that will allow it to leave more offspring or the next generation because this hypervariable thing is happening and it is changing so fast the phage can respond to its environmental conditions by changing what it is adhering to.
GT: What's your next step?
FB: One thing we want to do is get some phages that have these systems on them. We've already cloned out a bunch of these genes that are hyper-variablized targets and we want to build up more of a picture of the structure and function of these systems. We've argued that we are seeing genes that differ from Miller's MTD, and we want to strengthen that by getting some X-ray structures of the proteins involved and show the kinds of scaffolds that are supporting hypervariation in these systems. Also, as we generate more viral populations, we'll be analyzing in human body sites, different disease states, et cetera, et cetera, how phages may be varying.