Professor of Genetics
Harvard Medical School
Name: George Church
Title: Professor of Genetics, Harvard Medical School, since 1986
Experience and Education: PhD, Biochemistry and Molecular Biology, Harvard University, 1984 (worked with Nobel Laureate Walter Gilbert)
BA, Zoology and Chemistry, Duke University, 1974
George Church has been active in the field of DNA sequencing since the 1970s, when he started working with Walter Gilbert to develop the first direct genomic sequencing method. He also wrote the first automated DNA-sequencing software and, more recently, developed polony sequencing.
Church was one of the early advocates of the Human Genome Project and helped found genome centers at Stanford University, Massachusetts Institute of Technology, and at Collaborative Research, which later became Genome Therapeutics.
In 2005, Church launched the Personal Genome Project. In Sequence caught up with him last week to get an update on the project and his outlook on the field of next-generation sequencing.
Tell me about the Personal Genome Project. What are its goals?
The goal is to develop a context for genome-phenotype relations which is scalable, meaning that it’s inexpensive enough that you could apply it to hundreds of thousands of individuals, because only there do you get sufficient statistics to get common traits, including common diseases. It is right now focused on sequencing coding regions and splice junctions, and we have been collecting some sequence data already. Right now, we have 10 individuals that are approved for the study, but we hope to scale that up very significantly soon.
Who are these individuals?
They are intended to be a fairly broad, diverse set of subjects. They tend to be very well informed on the issues. It’s our opinion that a variety of people can get well-informed on the issues, of inheritance and of the risks and benefits of having your genotype and phenotype correlated in a database.
When did the study and the analysis start?
The study started in August 2005 when it was approved by the Harvard Medical School [Institutional Review Board]. The analysis has already started, too. But like most things, it’s a long process, which gets faster and faster as we go forward. The data collection and analysis started in 2007.
What technology are you using to sequence these individuals?
We are using next-generation sequencing. We have eight of the Nikon instruments in operation.
So it’s based on your homebuilt polony sequencing technology that you published in Science in 2005?
I don’t think Nikon will like you to call it homebuilt. I think all of these [technologies] require some attention. They are all intended for experts, they all require quite a huge commitment to getting them operational because they produce large amounts of data, and they involve new protocols and so forth. Ours is not that different from the other ones.
How is the project funded?
The NIH funds the technologies, but the actual data production is privately funded by private donors and income from my patents. Some of the private donors are getting interested in scaling it up. Hopefully, we will know by summer.
Why did you choose to restrict your analysis to exons and splice sites?
[It is] not permanently restricted; that’s just a pragmatic start because they constitute 1 percent of the genome, and they probably have about 90 percent of the causative alleles. And we feel the causative alleles are more important than linkage, they are easier to interpret. If you can get all the exons and splice junctions, you can bring the cost [of the project] down by another factor of 100. Which means that basically, the cost of doing these studies is about $1,000 per subject.
That scales up very nicely, in contrast to Jim Watson’s genome, where [454 has] estimated it was $1 million to do his genome [on a 454 sequencer]. That does not scale well. 100,000 genomes at $1 million each is $100 billion. 100,000 at $1,000 each is only $100 million. The individuals participating in the study could pay for it themselves, when it’s only about $1,000. It’s a huge difference. This is all about making a scalable project.
When are you planning to finish the first 10 individuals?
We hope to have some kind of report this summer, on the progress of the technology and preliminary results. The cell [lines] I believe will also be available in that timeframe. [The] Coriell [Institute for Medical Research] has established immortal cell lines for PGP subjects.
What aspects of the technology are you still developing, especially to reduce the cost of the project?
[For example,] we are trying to drop the volumes that we use and [as a result] the price of the enzymes and the chemicals [used]. And then the instrument is getting much faster as well.
Tell me briefly about your most important contributions to DNA sequencing.
I got interested because I was doing computer sequence analysis in the early 70s, and I realized that we needed a lot more sequences. At that time, it was possible to type in every sequence. So then I went to Wally Gilbert [at Harvard, whose group was] just starting to sequence, [and] they could read 30 bases or something. I helped […] sequence the first plasmid genome, which is pBR 322, and immediately started thinking about ways that we could scale that up, and about multiplexing. That eventually led to [a paper on] genomic sequencing [published] in 1984 [in PNAS] and [an article on] multiplex sequencing published [in Science] in 1988, and led to the first bacterial genome sequence sold commercially [by Genome Therapeutics], which was Helicobacter pylori in 1994, the year before the Haemophilus [influenzae] sequence [was published].
Many of the concepts that were in the 1984 and ‘88 papers are still present in the new polony [technology] … [and] are reflected in the current generation, the so-called next-generation sequencing.
How long will it take before we reach the $1,000 genome – and what technologies do you think will take us there? The current ones, or another generation?
I think the current ones will take us there. I don’t know exactly when. And I think the 100-percent genome is kind of a research fantasy rather than [something] practical. But I think that we will have the $1,000, 1-percent genome this year. I still think that people will want subsets of the genome. If you have a choice between doing 1,000 genomes at $3,000 each, and doing 100,000 genomes at $500 each, you are going to take the less expensive one.
Some people say you can already genotype the entire genome for $1,000. How much more information can we gain from generating sequence data?
Usually, people talking about genotyping, they are talking about chips, and doing 500,000 SNPs, but that’s 0.01-percent of the genome, and most of it not in coding regions, [which makes them] hard to interpret and dependent on linkage. For many reasons that linkage fails. I think most of us are now focused on causative alleles rather than linked alleles. And even if you get the linked allele, and even if the linkage turns out to be correct, you still have to hunt it down to the causative allele before you can go forward much. Our philosophy is to go straight to the causative ones and skip the linkage.