NEW YORK (GenomeWeb) — Ten years ago, a team at Washington University St. Louis led by Elaine Mardis and Rick Wilson, co-directors of the Genome Sequencing Center at the time, sequenced the first cancer genome, of a woman with acute myeloid leukemia (AML). The results were published in Nature in November 2008, alongside two other papers describing the genomes of an African HapMap individual and a Han Chinese. All three studies used the Illumina Genome Analyzer and were the first to sequence a human genome with that technology.
The AML genome, a landmark study, identified mutations in 10 genes, only two of which had previously been linked to this type of cancer. The project, showing that whole-genome sequencing of tumor genes could be done, opened the door to numerous large-scale cancer genome sequencing studies, including The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC).
Since then, Mardis and Wilson have moved to Nationwide Children's Hospital in Columbus, Ohio, where they are helping to integrate genomics into clinical cancer care. In an interview on the sidelines of the Advances in Genome Biology and Technology annual meeting in Orlando, Florida last week, they recalled the challenges surrounding the first cancer genome and talked about how the field has progressed in the decade since. Below is an edited version of the conversation.
Can you reminisce a bit about how the first cancer genome came about?
RW: One thing that sticks in my mind is the first meeting of the group that was to put together the Cancer Genome Atlas (TCGA) project in the summer of 2005 at the Ritz-Carlton in Washington. I remember getting into this discussion about technology and what we really needed — this was pre-[next-generation sequencing]. I remember saying, 'What we really need to be able to do for any cancer patient is to sequence their whole genome, the tumor genome, and the normal genome, compare the two, and see what's changed.' And Lee Hood – he is my old mentor – looking at me like and saying, 'That's right!'
We had already been sequencing tumors from AML patients and from lung cancer patients with PCR and Sanger sequencing [with collaborators], and for each of those projects, we were making lists of genes that we thought might be mutated, and we sequenced the exons of those genes. We found some things, but it was just clearly underwhelming, and we knew we had to figure out another way. So when next-gen sequencing finally came along, especially when we got the first two Solexa instruments outside of the UK, that's when we decided, 'This is the time to start this idea.'
EM: We were realizing what the potential for those instruments was, just based on how the libraries were being made, and the early work that we did with C. elegans sequencing also inspired this. Keep in mind, there was no hybrid capture at that point in time, that came along later, so we had to do whole-genome sequencing.
The human genome is big, but much like in the early days of the Human Genome Project, we practiced on C. elegans to get that paradigm in place. We resequenced the Bristol strain, which is the reference strain, and another strain of C. elegans that had never been characterized before, but we knew would have some genomic differences. We came up with this notion of comparing two genomes and identifying the variants that were different between the two, and we also did this PCR-based resequencing to make sure that we actually got it right. With the AML genome, we didn't know what types of approaches to use computationally to sort out the comparison between tumor vs. normal, it was all very new.
RW: There was no software, there were really no methods. Solexa had some basic protocols but our tech D group had to spend a lot of time figuring out what worked best, all by trial and error.
Paired-end reads were not available, and we used 32-base pair reads. For each genome, tumor and normal, we had to do 100 runs to get the data because you didn't get quite a gigabase per run.
EM: 'A billion base pairs at a time,' that was the tag line for Solexa, and we were just blown away by that number.
The 2008 AGBT abstract book doesn't even mention this tumor genome project — your abstract in there only talked about AML transcriptome sequencing.
EM: We started doing transcriptome sequencing because we thought that was a faster route to figuring out the mutations. That was at a time when RNA-seq didn't even exist. I think we were getting a little bit frustrated by that, again, because the analytical methods, as well as the technical methods for making the libraries, weren't very good. That was at the point when we made the decision to just try and go for the genome as opposed to the transcriptome.
Sequencing was also quite expensive at that time. Did you have funding for this project available?
RW: No, no one would fund it. Everybody said it was a stupid idea. We asked if we could use some of our [National Cancer Institute] grant for it, and NCI didn't like it. They thought it was a bad idea. We tried to use some of our [National Human Genome Research Institute] funds, and they were happy for us to use funds for development but not for data production. So we were like, 'How are we going to do this experiment?'
A medical doctor, who used to be the chair of the Department of Medicine at WashU School of Medicine, introduced us to Alvin Siteman, who put up a naming gift for the cancer center at WashU. Elaine, Tim Ley, and I went and told him our idea. It was a really fascinating conversation. He asked us some great questions, and he called me the next day and said, 'We're going to transfer $1.5 million worth of stocks to WashU and they are going to put it into an account for you.' So we could do the experiment. And it worked. And when we finally published it in Nature, we took a preprint over to him that we had all signed and said, 'Thanks, you made this happen.'
What was the most expensive part of the project?
EM: Data generation was the bulk of it, which is an interesting contrast to nowadays, when data generation is almost nothing compared to the analytical cost. Then, the analytical cost was mainly that we just had not done this before, and certainly not at the scale of the human genome, so we really needed time to develop it out.
If you look at that paper, what we ended up using was not even what we use today. It was a decision tree-based analysis. That worked, obviously, but it gave us 10 genic mutations. We knew we had probably done a reasonably good job because a couple of those mutations had already previously been described in AML; the other ones were not. But, of course, we didn't know. For a whole-genome sequencing project, even if you are just looking at the 1.5 percent of the exome, it wasn't clear if 10 mutations was enough — we just didn't know. Now that hundreds of AMLs have been sequenced, we know that even across subtypes of AML, you typically see on the order of 10 to 12 mutations. So, we were not that far from the average.
How were the results received when you first presented them, for example, at the Biology of Genomes meeting in 2008?
RW: Well, people were excited. Even people who didn't want to be excited, I think, were excited. I talked about it at a TCGA meeting, and a deputy director of the NCI came rushing up to me afterwards and said, 'We funded this, right?' and I said 'No, you didn't. You wouldn't fund it, you said it was a dumb idea.' To be fair, we got $1.5 million from Mr. Siteman, and we probably spent about half a million dollars from NHGRI on a lot of the development work that had to be done. We talked to the program officer a lot, and he said, 'I'd rather you not use that for production but for development, because that's going to be applicable to a lot [of other projects.]' So each genome, one tumor and one normal, probably cost about $1 million.
EM: To put this into context, this was early days, pre-TCGA, but we did have this project that involved all of the sequencing centers that we referred to as TSP, the Tumor Sequencing Project. That was also published in Nature, a couple of different papers, one on the copy number analysis, which was array based, the other on the sequencing analysis, which was PCR and Sanger based.
We were sort of ramping up towards tumor sequencing anyway, and that project was funded by our NHGRI centers grants as a precursor to what turned into TCGA. But, I would argue that if it hadn't been for next-gen sequencing, and if it hadn't been for that early work on the AML tumor/normal genomes, we probably wouldn't have seen TCGA take off as quickly as it did.
EM: That was the influence of Matthew Ellis and his and my involvement with an NCI cooperative group called American College of Surgeons Oncology Group. Matthew had designed this trial, and he knew he wanted to do genomics — that was the reason for him to come to WashU from Duke, he wanted to come to a place where genomics was happening. The idea is that you have the ability to look at patients who have all been clinically treated the same, with a central question in mind, in this case, aromatase inhibitor response vs. lack of response.
The funny thing is, I was the basic and translational science coordinator for ACOSOG for about five or six years, and I remember presenting this idea at their annual meeting, which is all these clinical oncologists and surgeons, and talking about it in front of everybody, and it was very widely embraced by that group. And I remember somebody coming up to me after the talk was over and saying, 'You know, this probably won't work.' And I said, 'Oh, really? Well, why is that?' and he said, 'Because you have to understand, all of these samples were collected at a bunch of different sites, there is probably going to be variable quality, and you probably won't get enough good-quality samples to actually do your study.' That was a challenge to me and I was, like, 'We'll see about that.' And it actually worked out. We could have analyzed more samples, but in those early days, the cost of sequencing was always the biggest limitations of how much we could do.
And then, the flood gates opened – numerous projects have each sequenced hundreds of tumor genomes. Can you fast-forward to today and point out what has changed over the last 10 years? What were the most important advances in technology and on the analysis side? Also, how much has this changed how patients are treated?
EM: The first advance was paired-end sequencing and longer reads, because it gave us much better ability to identify structural changes in the genome, with the paired ends, and to get a higher number of reads mapping because of their length, and the ability to identify where exactly those reads came from in the genome.
The other thing is that there has been this up-trend in interest of bioinformaticians in the analysis of sequencing data. It had to come along because without those advanced computational methods, we would not have been able to do the things that we did. Now you see things like tumor heterogeneity, which pathologists have been remarking on for decades, and now we have this fine-tooth comb, where we can show at the genomic level what the heterogeneity is through advanced computational methods for modeling. All of those methods began to coalesce into this toolkit that then allowed us to start working with our clinical colleagues to ask questions. Important things like, 'How does the genome look before and after chemotherapy? How does it look before and after targeted therapy? Can you pinpoint the reasons for resistance?' And we know that, especially in targeted therapies, we can find that information pretty readily. In some cases, it's also led to the development of new therapies that address the aspects of resistance.
The other thing that's been key is the ability to get libraries from less and less input DNA. Clinically, one of the challenges is that we have to share these materials with pathologists who are doing conventional tests that are important for the diagnosis, as well.
RW: Also, as run yields have gone up and up and up, cost has gone down. So instead of a million bucks, it's close to $1,000 [per tumor genome], depending on what exactly you have in your lab. And we have learned so much about cancer biology. We have learned that what's really important is not what organ or tissue the tumor is derived from but what its mutational landscape is. There are some breast tumors that look more like brain tumors than other breast tumors, and when you start thinking about how you might treat those patients, you could take lessons from the guys who specialize in treating brain tumors.
EM: Or you could take drugs from them. There was a paper that just came out in Nature on HER2 point mutation inhibitors that I wrote a News and Views about. One of the end points in all of this is the change in clinical trials structure, which I think is incredibly exciting for patients, and I think pharmaceutical companies are warming up to it. The bottom line is, you can take these basket trials, where you have these multiple tissue sites where the tumor has originated but the same mutation, [and] you can treat all of those patients and then figure out which tissue types actually give a response. And in that trial, as in many basket trials we have seen, it's not every tissue. So, not all mutations have the same mutational context, and not all mutations have the same impact in different tissues, and by extension, nor does the drug. But this is a way of screening it through and then taking those tissues where you did see a response and moving it to a phase II trial, so you can look at more patients.
I think the other thing that's changed a lot is what I would call the multi-omic or integrated approach to cancer including transcriptome data, which is ironically where we started with the AML project, [and] turns out to be a really valuable add-on because it tells you about the program that's going on in terms of the cancer biology. But it doesn't just stop at DNA and RNA. Methylation profiling has also entered into the realm, where you can subtype and get prognoses from methylation.
And then I guess our work in the Pediatric Cancer Genome Project also should be mentioned. Pediatric cancers are rare diseases, and they also lack attention from drug companies and clinical trials, so cataloging how those cancers look in terms of mutation landscape and other aspects of the genome turned out to be really important. As we were devising the experimental approach to that project with St. Jude, it was a huge investment on their part in terms of not only the tumor types that we were going to study but also the approach. There was this back and forth about 'Should it be exomes? Should it be whole genomes?' and we really stayed the course on whole genomes because we wanted to be as comprehensive as possible. That, [it] turns out, was the right decision in retrospect because many pediatric tumors are driven by structural variants that create fusion genes. The mutational landscape is pretty sparse, but if you can identify these fusions, which is what we're doing now in our new work, along with other genomic features of pediatric cancers, then you can actually start to apply drugs.
For the first AML genome, you had these four tiers of mutations, including mutations in non-coding regions that could not be interpreted at the time. Have any of these mutations become interpretable — has it paid off to do whole genomes?
RW: They’re starting to. Probably the most interesting tier 2 mutations that we know about are the TERT promoter mutations.
EM: We didn't find those, but it certainly gives credibility to the notion of looking in the conserved non-coding space, because it's a highly conserved promoter for that gene that determines telomere length.
RW: I think as more and more noncoding regions of the human genome get annotated and functionalized in cancer, we can start to point at more of these. It's still early.
And in terms of sequencing read length, even though we've gone from 32 to 150 bases, it's still hard to do a great job of resolving a lot of regions of the human genome. So, I think we can still do better. As some of these longer-read technologies evolve and become less expensive, it's going to be fun to see how they play in. We're looking at PacBio Sequel and Oxford Nanopore now, we want to take a couple of cancer genomes that are well characterized. Like in 2008, they will be expensive, though they won't be $1 million a genome, but we will kind of go through the same process again with these longer reads and just see what we can find that we're missing with short-read technology.
Do you expect to be missing much?
EM: If Evan Eichler is to be believed, then we are.
RW: I think there is a tremendous amount in the genome that we're not yet understanding, and some of the talks here at AGBT point at what some of those things are. It's really learning how to read a completely different code.
You're talking about things like non-coding RNA?
EM: We've been telling people who can do this sort of thing that we need a second annotation tool for the human genome that annotates all of the non-coding RNAs that we know about, for example, in terms of not necessarily their function but where they lie in the genome, so we can begin to do a secondary analysis of whole-genome data.
RW: DNA was the obvious place to start, RNA was right there, but now we have to start looking at other kinds of signals that we can look at with all these other applications, some of which use NGS as a readout but approach library construction in a completely different way to capture things that give us clues about how different parts of the genome interact and affect gene regulation. This includes methylation, both histone and epigenetic marks.
Back in 2008, nobody was talking about immunotherapy yet, at least not in genomic circles. How did that come into play and coalesce with genomics?
EM: That was really the coalescence of a couple of collaborations that developed while we were still at WashU, which has a very rich history in immunology, in particular cancer immunology. Bob Schreiber, who I did that work with, and I had both been at WashU for almost the exact same amount of time, a long time, before we actually met. And the only reason we met is because Rick was out of town, and I was sitting in for him as the cancer center was preparing for the next renewal from the NCI. I ended up sitting next to Bob and meeting him for the first time and heard him give his talk. I had no idea who he was and that he was such a luminary in cancer immunology. But he called me a week later and said, 'I really enjoyed talking to you the other day and I wonder, I have a couple of crazy ideas, can we get together for a cup of coffee and talk about this?' So, one thing kind of led to another, and we published these two Nature papers [here and here] looking at the immune landscape of this mouse model that he has worked with for years that's a lot like human tumors with high mutation burdens, like melanomas, for example, because it's induced by a treatment with a carcinogen. As we were developing that work with Bob, looking at the genomes of these mice, and figuring out what the putative neoantigens were through these pipelines that we were developing, we also started around personalized melanoma vaccines. And those two projects ended up overlapping significantly because once we developed the tools to predict neoantigens, we could use that for everything we wanted to, whether it was mouse or human.
I don't think people really knew that this would turn into what it is now immunogenomics, and as we pointed out in our very earliest studies, looking at RNA, [and] which of these neoantigens are actually expressed, especially in high-mutation load tumor types, is really important. You don't want to design vaccines from neoantigens that are not going to be turned into proteins.
Now, we know that on top of the neoantigen prediction, the value of RNA is that we can tease apart the signatures of some of these invading immune cells. Before and after treatments like checkpoint blockade, you can actually say, 'Hey, we didn't have any T cells in here, and now we do,' and we can see their signature of gene expression just from looking at the RNA-seq data and compartmentalizing it accordingly.
To me, it's just amazing how it keeps building. From this very early beginning, where you are sort of stumbling around in the dark trying to figure out what the data means and how to interpret it, to now, where I feel like we're still developing, but the level of sophistication, and the ways that you can interpret the same dataset in multiple different frameworks, turn out to be very interesting and important. For the cancer genomics papers of today, you almost have to examine all of those different aspects, or else you feel like you're leaving something on the table, if you don't look at the immune-ome as well as the targeted therapy-ome as well as the interactome between DNA and RNA.
Finally, where do you see things going? And where do you see the next technological breakthroughs coming from?
RW: I want to be able to better understand epigenetic changes and histone involvement. One of the things we already know, for example, for brain tumors, is that quite often, just simple methylation array data can help a pathologist make a diagnosis. There are not many places doing that, and certainly not clinically. That's something we're going to try to push a little bit. And even then, you are looking at patterns that you might not really understand, so we have to get a better understanding of some of the biology behind that that's going to make that whole picture clear.
EM: I think where this is moving is clinically. There are now well-sized studies in pediatric and adult patients that clearly illustrate a clinical benefit to patients, and the clinical utility of broad-scale profiling to identify vulnerabilities and identify targeted therapies.
Does this mostly apply to late-stage cancer patients?
EM: You're right, to some extent, we're still confined a bit by the practice of medicine, which says, until you fail the standard of care, you are not eligible. I think what the group at Memorial Sloan Kettering has shown, however, is that even studying a primary tumor can give you clues about how to treat the metastatic tumor. And you're much more likely to get that sample to begin with because a lot of metastases are not removed surgically or even sampled through a biopsy. So, getting that information in the bank, and in the electronic medical record, in case the patient proceeds to metastasis, can be important in identifying the vulnerabilities without having to wait, because you have basically already done the analysis. You might want to do a quick check to make sure that mutation hasn't gone away, but the bottom line is that that's an OK way to do things, even though you are not going to typically be able to apply it.
One of the advantages in the pediatric setting is that we do have very rare tumor types – just because cancer overall in peds is less prominent as a disease, it's even easier to find things that are super rare. And there, you often do not have a defined standard of care because it hasn't been established by a clinical trial, because you can't get enough patients to put together a clinical trial. In some cases, we will actually identify vulnerabilities that can be used in a primary setting. There is a little bit more freedom to operate, if you will, from the oncologist's perspective. That still depends on a lot of things, like the ability to get the drug, and the ability to have insurance pay for it, etcetera.
It really takes a setting like the one we're in, where you have the entire clinical care team involved, not just the genomics people but everybody in the spectrum from surgeons to oncologists to pathologists in order to bring that all together and brainstorm around answers for individual patients. And we're certainly not the only people to do that, but it still remains a bit rare to have this broad team-based, hospital-based effort. We have had nothing but enthusiasm from our new colleagues at Nationwide Children's. We always talk about how people who didn't study genomics in medical school, which is most of the doctors in practice right now, don't have an appreciation for it. But one of our most enthusiastic cheerleaders is a 70-year-old medical oncologist who couldn't be more excited every time he comes to discuss a brain cancer patient, because he's been treating them for 40 years, and he is excited to see these new opportunities open up. It's very encouraging.
Is there anything else you'd like to mention?
RW: One of the things we learned in retrospect about the woman whose AML genome we sequenced [was that] we found the smoking gun, we just didn't know until we then went off and sequenced another 200 patients and found that a lot of them had the same mutation. It wasn't a gene that would have been on our usual suspect list when we were sequencing with PCR and Sanger sequencing. There is a whole bunch of genes that would not have been on anybody's list of cancer genes.
EM: And with the Idhifa drug now being FDA-approved for IDH mutations in AML, that's actually a really gratifying complete circle from gene discovery all the way to therapeutic approval.
And just to amplify Rick's comments, all of the work really drives home the importance of patients who are willing to sign research consents to have their tumor material banked, because we would be nowhere if it were not for patients who have this altruistic idea that even though they probably won't be cured of their cancer, by donating it for research, they could help future patients.