Name: Robert Holt
Title: Senior scientist and head of sequencing, British Columbia Cancer Agency's Genome Sciences Centre, Vancouver, since 2002
Associate professor of molecular biology and biochemistry, Simon Fraser University
Experience and Education:
Senior Scientific Operations Manager, Celera Genomics, 1998-2002
Postdoc in molecular evolution, State University of New York, Albany, 1998
PhD in pharmacology, University of Alberta, 1998
BSc in general science, University of British Columbia, 1992
Rob Holt has been heading sequencing at the British Columbia Cancer Agency's Genome Sciences Centre since 2002, after spending four years at Celera Genomics. Earlier this month, he and his colleagues published a study online in Genome Research in which they used Illumina's Genome Analyzer platform to profile the repertoire of T-cell receptor beta-chains. The T-cell repertoire is necessary to mount an immune response against a diversity of antigens. In Sequence spoke with Holt last week about his work, and how sequencing will soon be able to provide new insights into immunology.
Tell me about your study. How many different T-cell receptors are there, and how can new high-throughput sequencing technologies be used to profile them?
The T-cell repertoire is quite large. It has not been known with any certainty exactly how large — there have been estimates published in the past of on the order of 106 to 108 different T-cell clonotypes, or specificities, in a given individual at a given time.
T-cells represent the cellular immune system that recognizes antigens that are presented by, in humans, the human leukocyte antigen, or HLA, which is known in other species as MHC, or major histocompatibility complex. Foreign organisms are engulfed, digested, and their peptide digestion products are presented at the cell surface by the HLA or MHC. The complex, with the peptide antigen, is recognized by T-cells, specifically by the T-cell receptor.
The receptor is a heterodimer, which has an alpha chain and a beta chain for most T-cell receptors. An enormous structural diversity in the receptor is necessary to recognize all possible antigens that could be derived from the environment — bacteria, viruses, or even altered self, like coding mutations in cancer. In order to generate the structural diversity, the T-cell locus undergoes somatic recombination. You often think about the human genome as being somewhat static, and the notion of a single reference genome holds for the vast majority of loci. But [for] at least two loci, the regions that encode the T-cell receptor and antibodies, the human genome is really a metagenome. The T-cell recombination activity occurs in the thymus during development as T-cells mature.
Nobody has really been able to do a sequence-resolution study of T-cell diversity, simply because of cost. To think about it, if there is really 1 to 10 million, or perhaps even higher numbers of T-cells in peripheral blood — and theoretically, the number could be much higher, if you consider all the different mechanisms of recombination — then to approach that with Sanger sequencing, where costs have been 50 cents to a dollar per read at the most efficient high-throughput centers, would be very expensive.
Given that it is a shotgun sequencing approach, so you would need to sequence with some redundancy — say 5- or 10-fold coverage — you would be looking at a $10 million project with Sanger sequencing, even to just profile a single individual at a single time point. That's why the new short-read sequencing platforms have really been enabling for looking at sequence-level resolution at these loci.
[ pagebreak ]
How did you go about your study?
The region of the T-cell receptor sequence that is informative is called the CDR3 region. It's the site that makes direct contact with antigen, so it's the most highly variable, and it's the specific site of recombination, where the junction of the different gene segments of the T-cell receptor that are somatically recombined join together. By generating sequence diversity at that site, you generate structural diversity in the receptor at the cell surface.
The CDR3 region is about 100 bases long, but in order to get to that 100 bases, you have to put PCR primers flanking that informative sequence further apart, so it's really only practical to generate an amplicon that is on the order of 500 bases long, because you need to find unique priming sites. You clearly can't get across a 500-base pair amplicon with a single short read from any of the new platforms, except perhaps 454 might be approaching that, but the cost of 454 sequencing isn't competitive on a per-base basis with Illumina or ABI SOLiD sequencing.
We use a RACE approach, so we are sampling cDNA using an RT-PCR approach, to generate an amplicon that contains the CDR3 sequence, and then we just do a shotgun sequencing project, where we take that amplicon, shear it into small fragments, add Illlumina adaptors, and sequence to a given depth. We used both 50- and 75-base pair reads in the current study.
Can you briefly talk about the data analysis?
We used iSSAKE, a short-read assembly software package that is based on SSAKE, which our group put together almost two years ago. We modified that code for this specific assembly problem. It's a fairly straightforward assembly application, where you start where you have known conserved sequence in the receptor, towards the 5'-end of the amplicon, and you find sequence reads that map there but then have variable sequence at their ends, meaning they are starting to bridge into the highly variable, somatically recombined region of the receptor. And then we simply do a seed-and-extend-based assembly, so we walk across the highly variable region and assemble everything we can. Then we look into contigs, and the depth of the contigs is proportional to the number of T-cells that had receptors with that specific sequence in peripheral blood, so that's how we can quantify the abundance of different T-cells.
David Haussler's group at the University of California, Santa Cruz, is developing an "Immunobrowser" application for the UCSC site, and we will be posting our data there. It sounds very interesting and will meet an important need for the community as genome-scale immunology data continues to be generated.
Why did you decide to pool T-cells from several hundred individuals for this study?
At this point, it's been largely methods development, so we needed a lot of material to work at optimizing the method. The other reason is, we thought it would be more useful as a first experiment to just get a survey of population diversity. If we analyzed peripheral blood for the repertoire of a single individual, we would have no idea if that was representative of the population in any way, for example the V(D)J usage statistics and length variation statistics. Here, we don’t know what the individual variation is, but we at least get a sampling of what would be considered a fairly representative population, with 550 people contributing to the pool.
Next, we want to look at individuals, see how the repertoire varies from one individual to another as a function of their age, for example. There is a period of T-cell maturation and development in the thymus in neonates, but even after one year of age, your thymus starts to degenerate — that's called thymic involution — and as you age, you are less able to generate naïve T-cells. Towards the end of adolescence, you have a marked decrease in the ability to generate naïve T-cells. You have a burst of development of naïve cells as a neonate, and those, by and large, provide the repertoire that you have for the rest of your life.
There are some interesting implications for that. As you encounter pathogens, or self-mutations, over the course of your life span, you are going to consume that repertoire; it will become dedicated to specific antigens and differentiate into effector and memory cells. That's good, because it gives you protective immunity in case you see that mutation or pathogen again, but the thinking in the field and the evidence suggests that as the memory compartment increases, it does so at the expense of the naïve compartment. So you are generating fewer naïve cells as you age, and they are becoming replaced, largely, by memory cells. That means [that] as you age, you have a reduced ability to mount a response to new mutations, infectious agents, or any sort of immune challenge. That's probably why influenza and some other infections are more serious in elderly people.
And there is a school of thought out there that suggests that one of the reasons we are more likely to get cancer as we age is not just because we are more likely, by chance, to incur mutations in oncogenes or tumor repressor genes, but because we are also less able to mount an immune response against those altered-self mutations. So it's really both of those things that conspire to increase your likelihood of ultimately getting cancer.
[ pagebreak ]
How much of the T-cell repertoire did you actually sample in your study?
The next technical challenge is to do exhaustive sequencing of an individual or several individual repertoires, and by that I mean, just sequence with such redundancy that we know we have captured everything present in our sample of peripheral blood. Until we do that, we don't actually know with sequence-level resolution what the size and diversity of the repertoire typically is. We can conceive of future studies that we can do qualitatively, but in order to develop budgets and research plans, we need to know how deep we will likely have to sequence to get saturation, and we don't know that yet. That's underway right now.
Why did it take so long for a study like yours to be done? These technologies have been around for a few years now.
That's a good question. There are other groups working on it as well. Andy Fire at Stanford is looking at the antibody repertoire in humans, and Steve Quake's group [at Stanford] has looked at the antibody repertoire in zebrafish [a study published in Science last month]. So it's starting to happen.
But the disciplines of immunology and genomics aren't, in my observation, as closely aligned as, for example, oncology and genomics, which have been for quite some time. I think people in the immunology community are quite often focused on cellular assays, although there is a community of immunogeneticists that have worked out all these mechanisms of somatic recombination that have just been fantastic advances. So I think it's just a matter of time before the sequencing technology sort of permeates that broader field of immunology, and there will be some quite interesting findings, I'm sure.
What technology improvements could make it even easier to undertake studies like this?
Longer reads, of course. If we get reads that can span the entire amplicon that contains the CDR3 region, then we could sequence much deeper with a smaller number of reads, because we would not need a large number of reads that we would assemble to get a single sequence. So any increase in read length would be very helpful.
The fidelity of polymerases and the accuracy of sequencing are also quite important. We can only identify a specific clonotype as a unique T-cell receptor sequence if we are confident in the accuracy of that sequence, because if you have two sequences that differ by one nucleotide, it's impossible to tell if that difference is a sequencing error versus a real difference that occurred during recombination.
One important thing I did not mention yet is that it's not just recombination among different gene segments that gives the structural diversity to the T-cell receptor, but at each of those recombination junctions, there is random deletion and then re-addition of nucleotides by an enzyme called terminal deoxytransferase, TdT. And it's really that random nucleotide removal and addition that gives most of the structural diversity of the T-cell receptor. So you need to be quite certain that what you are seeing is real sequence, not sequence due to polymerase errors, and the only way to do that is to get adequate redundancy. But certainly any improvements in quality-calling or base-calling algorithms would be helpful as well.
What are some possible applications of TCR repertoire profiling by sequencing?
One thing we are interested in doing is exploring the impact of chemotherapy on the immune repertoire. There is some evidence that cytotoxic agents used in chemotherapy are efficacious not just because they kill tumor cells, but because in the process of killing tumor cells, tumor antigens are released from those dying cells and presented to the immune system. So it's really a fairly crude but effective way of stimulating immune response against tumors. There is also the possibility that the cytotoxic agents themselves are creating additional mutations in those cells, which would lead to more novel antigens. So we want to look at the immune repertoire before, during, and after courses of chemotherapy in different cancers, and see if we can correlate any changes in the repertoire with outcome.
[ pagebreak ]
In terms of biomarkers, it is useful if you can find a T-cell in the repertoire that is responsive to something — whether it's infection with a virus, bacterium, or a fungal agent, or vaccination — or that is associated with the development of an autoimmune disorder, or that appears to be responsive to cancer. For example, if there are lymphocytes that infiltrate the tumor, it's been known for some time that that leads to a better outcome.
So if you could find out what those T-cells were, in any of these cases, then that T-cell really becomes a diagnostic marker. For the first time now, we have the ability to look, with sequence-level resolution, at the repertoire and find T-cells that could be associated with any of these conditions I just mentioned. And if you know the sequence, you should be able to use that sequence as a biomarker and correlate it with diagnosis. The way, in a practical sense, to use it as a biomarker is to develop a quantitative PCR assay against that specific sequence that comprises the CDR3 region. We are working on optimizing quantitative PCR approaches to go and detect specific T-cell clonotypes. That would, in a sense, open up a whole new area of diagnostics, where immune responses are important to outcome.
Besides studying the immune system, you are also the head of sequencing at the BC Cancer Agency Genome Sciences Centre. Can you talk for a moment about how you are equipped with sequencing instrumentation?
We have 11 Illumina instruments, and we are just in the process of upgrading those to GA IIx format. Those run near 24/7, and we have a sizable backlog of libraries waiting to be run in those. And then we have a production library construction core, which really has been key to keeping the instruments fed. We also have 2 ABI SOLiDs that we received recently, and we are working on getting those up and running. And then we have a capillary sequencing core as well that has eight 3730s. They are quite heavily utilized still.
Where do you see the next big advancements in sequencing technology coming from?
I find that hard to predict. Clearly, single-molecule approaches, ultimately, will be enabling for a lot of things you want to do, even faster and cheaper sequencing. But I am happy to wait and see at this point. We try to be involved in all different early-access programs with the vendors that we can. We are not instrument developers ourselves, but we are quite keen and enthusiastic about trying out the new technologies as they become available. The vendors need groups with expertise in library construction, for example, plus expertise in different areas of application, for example, immunology and T-cell repertoire analysis; and oncology, finding tumor mutations. And researchers that have actual interesting questions are now enabled by yet another generation of sequencing instrumentation.