Name: Charles Lee
Titles: Director of cytogenetics, Harvard Cancer Center; associate professor, Harvard Medical School; associate member, Broad Institute of Harvard and MIT; clinical cytogeneticist, Brigham and Women’s Hospital
Background: 2002, clinical cytogenetics certification, American Board of Medical Genetics; 1998-2001, postdoc, Harvard Medical School; 1996-1998, postdoc, Cambridge University, UK; 1996, PhD, genetics, University of Alberta, Canada; 1990, BSc, genetics, University of Alberta
For most of this decade, the Boston lab of Charles Lee has been centered on the use of state-of-the-art molecular cytogenetic technologies to study the structure of vertebrate genomes to understand human diseases.
Lee is director of cytogenetics for the Harvard Cancer Center, associate professor of pathology at Harvard Medical School, associate faculty member of the MIT Broad Institute, and a clinical cytogeneticist at Brigham and Women’s Hospital. His current research has three major components: the development and application of molecular cytogenetic tools for model organisms; structural genomic variation; and the identification and characterization of cancer biomarkers.
Lee, who is also a leader in the field of identifying and interpreting genomic structural variation, and is engrossed in ongoing endeavors such as the 1000 Genomes Project and the Genomic Structural Variation Consortium, is using his experience as a cytogeneticist and technologies like microarrays and second-generation sequencers to better understand human genomic structural variation.
To get a sense of how the cytogenetics community is adopting these newer technologies, BioArray News spoke with Lee last week. The following is an edited transcript of that interview.
Why did you become a cytogeneticist?
My undergraduate degree was in genetics [at the University of Alberta], which I enjoyed very much because I found genetics to be largely about understanding concepts and problem solving — not the straight memorization of facts. During my senior undergraduate year, I did a research project with C.C. Lin, who was the director of cytogenetics at the University of Alberta Hospitals and maintained an active research laboratory as well. I admired individuals [who] could successfully take on both of these tasks. I had developed a passion for research, realized that I had a knack for microscopy and enjoyed visualizing chromosomes, much more so than just looking at bands in a gel, and recognized the satisfaction of being able to actively use the latest cytogenetic technologies.
After that, I went on to conduct two postdoctoral fellowships — one in basic research and one as a clinical cytogenetic fellow. My first postdoctoral fellowship was with Malcolm Ferguson-Smith at Cambridge University in England. He is a world-renowned cytogeneticist who was pursuing cutting-edge molecular cytogenetic technologies including spectral karyotyping, Rx-FISH, chromosome flow sorting and painting, as well as research in sex determination and prenatal diagnosis. For my clinical cytogenetic training, I trained with Cynthia Morton at Harvard Medical School. Now, I am a board certified by the American Board of Medical Genetics and provide clinical cytogenetic services at Brigham and Women’s Hospital around 20 percent of the time and run a 23-member research unit at BWH/Harvard Medical School during the remaining 80 percent of my time. It is an ideal balance for me.
When you were doing research in Alberta and later in Cambridge, what technologies were available to you?
At the University of Alberta, we did a lot of G-banded chromosomal analyses, and were just beginning to use the technique of fluorescence in situ hybridization with locus-specific and chromosome-painting probes. Then, when I was in Cambridge, our laboratory used many advanced molecular cytogenetic technologies, including spectral karyotyping, chromosome flow sorting, cross-species chromosome painting, and fiber-FISH. There was no array technology in the laboratory at the time, as the first array CGH papers were just being published in 1997.
[ pagebreak ]
When did arrays enter the picture for you?
In 2003, I had just been promoted to from instructor to assistant professor at Harvard Medical School and had set up my research laboratory with the ability to perform array-based comparative genomic hybridization experiments. However, we were far from being a leader in the aCGH field. That honor belonged to individuals such as Dan Pinkel and Joe Gray at the University of California in San Francisco, Nigel Carter at the Sanger Center, Joris Veltman at the University of Nijmegan in the Netherlands, among others. They were really key figures in establishing aCGH as a technology that would soon become accessible to all cytogeneticists, conducting research or clinical diagnostics.
What kind of platform did you use at first?
The arrays being produced at that time were predominantly BAC-based arrays, containing human insert DNAs of around 150 kilobases. We were toying with the idea of making our own arrays, but it was clear that I did not have the manpower in my newly established laboratory or the financial resources to make this happen. I was then introduced to a company called Spectral Genomics that had just been established to make BAC-based arrays for aCGH experiments. I first began using their mouse BAC arrays and realized that many of their BAC clones were mapped incorrectly. About one-third of the clones [were] not only mismapped on the chromosome, but were even mapping to different non-homologous chromosomes. When I showed them the results from our FISH validation experiments, they asked me to serve as an unpaid consultant for the company. It was a wonderful opportunity to learn the fine details of array production and troubleshooting for aCGH experiments.
How has that developed?
We went from using Spectral Genomics mouse arrays that had an effective resolution of approximately 3 megabases, containing around 1,000 clones per array, to Spectral Genomics human arrays, that had a similar effective resolution using about 1,000 clones per array. As you know, we have now gone from about 1,000 clones per array to over 2 million oligos per array.
We then transitioned to Spectral Genomics human arrays that contained approximately 3,000 clones per array, having an effective resolution of about 1 Mb for gains and losses in the human genome. At this time, one of my first postdocs, John Iafrate, began to test the arrays for clinical cytogenetic diagnostics. Being a good scientist, he first performed control experiments with the arrays. One of those control experiments was to compare the genomic DNA of one healthy, normal individual with another healthy, normal individual. Based on what we knew from our genetics courses and the human genome project, the [genomes] of healthy, normal individuals were 99.9 percent identical and differed primarily with respect to SNPs. Since these array platforms would not detect SNPs, we expected to see 'flat-line' profiles during these control experiments. Clearly, we were surprised when the results that we got suggested that there were large-scale gains and losses in each of the 39 unrelated healthy, normal individuals tested. We then collaborated with Stephen Scherer at the Hospital for Sick Children in Toronto to collate the results in a database, now called the Database of Genomic Variants. We wrote up our findings and submitted the manuscript to Nature Genetics. The paper was published online on Aug. 1, 2004. Another paper, from Jonathan Sebat and Michael Wigler’s group at Cold Spring Harbor Laboratories, was published in Science within a week of our paper and showed the same phenomena. These gains and losses, now referred to as copy number variants, [are] the largest component of currently known human structural genomic variants and [have] become an exciting field of research in human genetics. CNVs also have direct implications on accurate interpretation of clinical aCGH diagnostic assays.
[ pagebreak ]
What does your technology mix look like now?
We don't use BAC-based arrays anymore; hardly ever. The main reason is that oligonucleotide arrays have really come a long way in terms of quality and increased resolution. It is even possible to develop custom array sets that combine data from multiple slides. For example, our Genomic Structural Variation Consortium, led by Stephen Scherer, Matthew Hurles, Chris Tyler-Smith, Nigel Carter from the Sanger, and myself, have constructed and used a NimbleGen array set containing 42 million oligo probes distributed across 20 slides to interrogate individual genomes for CNVs at a 500-base-pair effective resolution. In collaboration with [Seoul National University Professor] Jeong-Sun Seo, we also developed a 24 million oligo probe array set across 24 slides to achieve a similar effective resolution for CNV detection in humans. Some of the work in our laboratory also involves developing CNV maps for model organisms and to complete this work, we have constructed custom arrays specific for the chimpanzee, macaque, and mouse.
Are you still using FISH and the older technologies?
We still do a lot of FISH. The reason is that there are limitations to any technology, including aCGH, and I consider FISH and aCGH to be complementary technologies. For array CGH, the main limitation is that it will not pick up balanced chromosomal rearrangements. You can detect gains and losses, but, let's say for example, you detect extra chromosomal material in a child with dysmorphism and developmental delay. You don't know if that extra chromosomal material is tandemly arranged at the same chromosomal locus or if the extra chromosomal material is located at another chromosomal region, or even on another non-homologous chromosome. In the clinical setting, these two scenarios, tandem duplications and non-tandem duplications, can have quite different implications when counseling the parents of a child found to have extra chromosomal material. FISH studies on the child’s chromosomes will provide information on the chromosomal location of the extra material and FISH studies in the parents may reveal a balanced chromosomal rearrangement that is associated with an increased recurrence risk of having another child with extra chromosomal material.
We have been talking about CGH. What is your opinion of the SNP-based platforms?
As you know, aCGH is the term used when you label two DNA samples, one test DNA sample and one normal reference DNA sample, with different fluorochromes and co-hybridize the labeled DNAs onto a single array. SNP arrays are genotyping arrays that were originally designed to specifically detect single nucleotide substitutions, not genomic gains and losses. Both Affymetrix and Illumina have now come up with new SNP array designs that include additional probes to specifically detect copy number changes. In general, they can infer copy number changes through changes of hybridization signal intensity and the identification of long stretches of apparent SNP homozygosity. Since only one labeled genome is hybridized to each SNP array, some of the inferred CNVs are obtained after comparing the data obtained with a reference data set.
The nice thing about the newly designed SNP arrays is that you do obtain SNP and CNV data on a single platform. For genome-wide association studies, this means reduced input DNA, reagents, labor and costs, compared to running a SNP array and an aCGH assay on each individual. Still, there are also important limitations of these SNP arrays that scientists need to be aware of. For example, Affymetrix 6.0 chips generally detect common CNVs that are about 30 kb in size and larger. These chips are less likely to detect rare CNVs, especially if they are less than 30 kb in size. Similarly, the Illumina 660W platform is not designed to detect rare CNVs and for the smaller, common CNVs that it does target, it is only able to detect about 63 percent of them — probably due to a lack of sufficient informative probes for many of the targeted CNV regions. Illumina has also recently come out with another new SNP bead array design, the Omni array. Unfortunately, this is such a new product that we have not yet been able to assess the CNV detection abilities of this particular array. With so many commercial platforms currently available for detecting CNVs, what is really needed is a head-to-head comparison of all the array platforms — something that the Genomic Structural Variation Consortium is currently working on.
[ pagebreak ]
Every lab does seem to have its own mix of technologies. Can you talk about what the community is doing to address the issue of disparate labs obtaining different results for the same experiments?
As I mentioned, there should be an unbiased and comprehensive study conducted that compares the different technologies to assess what each of the platforms are picking up and what each are missing. I suspect that when you look at large-scale genomic imbalances, most platforms will pick them up. It is the smaller ones that could be interpreted differently, especially when they are embedded within or in close proximity to complex genomic regions containing segmental duplications or other repetitive elements.
For the clinical diagnostics arena, it would be useful to achieve standardization of array platforms. There is an International Standardization of Cytogenomic Array Consortium, being led by David Ledbetter and Christa Lese Martin at Emory University, whose aim is precisely this, for constitutional and eventually perinatal cytogenetic cases. This consortium is attempting to establish minimum requirements, such as what targeted regions and what minimum effective resolution should be achieved, for such a clinical cytogenetic diagnostic array. The consortium also aims to share array data among the members to assist with accurate interpretation of copy number changes. Each cytogeneticist could then use whatever platform he or she desires, so long as the array meets the minimum requirements set by the ISCA. For arrays being used for cancer cytogenetics, a cancer cytogenomics array consortium is also being established, an initiative led by Marilyn Li at Tulane University.
How do you run the projects you do each day while staying up to date on the mass of genomic structural variation information being produced globally?
If I understand correctly, you are referring to how we run translational- or clinically-based projects while more and more copy number variation information is being accumulated on normal individuals. This definitely is an ongoing challenge. For the past five years, the Database of Genomic Variants has been cataloguing published data on CNVs detected in normal, healthy individuals. Of course, all CNV projects have some false positive rate of detection that can be 5 percent and 20 percent or higher, and this information is unavoidably being incorrectly catalogued as normal variance. For clinical cytogeneticists, we rely on the information in the Database of Genomic Variants for helping us to interpret the pathogenicity of genomic imbalances seen in an array CGH result of a patient. To put it simply, genomic imbalances that have been found in normal, healthy individuals are less likely to be pathogenic in a clinically recognized patient. However, since we know that there is false positive data in the Database of Genomic Variants, we cannot solely rely on these criteria for interpretation of genomic imbalances in a patient. Fortunately, we also are able to use criteria that cytogeneticists have been using for decades, only then it was for gross chromosomal rearrangements. For example, if a chromosomal rearrangement was found in the patient and also found in one of the healthy parents, it was thought to be less likely to be pathogenic. Of course, there are always exceptions to such rules. For instance, a deletion in a patient may appear identical to a deletion in the healthy mother, but in the patient, the deletion unmasks a recessive mutation — which is not present in the mother, leading to drastically different phenotypic effects.
Clearly, all clinical cytogeneticists have to be very careful with interpreting array CGH results, making use of all available clinical data, data in online databases such as the Database of Genomic Variants and DECIPHER, genomic content of the imbalances, and working closely with the genetic counselors to help them advise patients that the array CGH interpretations are not absolutely correct, but rather based on cumulative amounts of relevant information available to us. I hate to say this, but sometimes it seems like there is a little bit of an art involved in deciphering these imbalances. I think that as time goes on and we collectively acquire more genotypic and corresponding phenotypic data, these interpretations will become easier to make and more accurate. Needless to say, extra caution should be exercised in prenatal array CGH testing, and the assays should be used as an adjunct to other available information, such as an ultrasound findings, family history, et cetera.
[ pagebreak ]
How has your rate of diagnosis developed over the years that you've used arrays?
In the past, when the gold standard for genetic testing of patients with developmental delay and dysmorphism was G-banded karyotyping, as much as 98 percent of the cases had a normal karyotype. Now, with aCGH testing, abnormal results can be obtained in as much as 18 percent of these same cases. This represents a substantial increase in pickup rate, prompting many clinical laboratories in Europe and some in North America to adopt aCGH testing as a first line of testing, prior to G-banded karyotyping. Such a strategy could save a clinical laboratory a lot of time and resources.
Interestingly, some believe that using higher and higher resolution arrays will continue to increase the abnormal pickup rate in clinical laboratories. However, the available data does not appear to support this assumption. As we utilize higher-resolution arrays, we appear to be detecting more benign CNVs and not many more pathogenic genomic imbalances.
How do you see this technology developing over the next few years?
In addition to genomic imbalances, segmental uniparental disomy, or genomic regions where both alleles are derived from only one of the parents, can sometimes also lead to clinical genetic pathogenicity. For example, Prader-Willi syndrome results in about one in 25,000 newborns and can result from a loss of paternally inherited genomic sequences in chromosome region 15q11-q13. I suspect that there are many other genomic disorders that are also due to specific segmental uniparental genomic regions. Such regions that are associated with a deletion can be identified by aCGH. If the uniparental genomic region is disomic, or present in two copies, aCGH will not be able to detect the aberration. SNP arrays could. So, I see more use in the clinical of either SNP arrays that also detect genomic imbalances or aCGH platforms that are also designed to obtain SNP information.
What about next-generation sequencing tools? Do they have any impact on what you currently do?
Absolutely. In our research laboratory, about a quarter of the ongoing projects involve some sort of next-gen DNA sequencing. I don’t see next-gen sequencing replacing array-based technologies in the clinic anytime soon, but as the rate of DNA sequencing continues to fall, more research projects are emerging that involve exome sequencing or even whole-genome sequencing.
Currently, our research laboratory is involved in the 1000 Genomes Project. Our laboratory is specifically involved in the structural genomic variation analysis group. With several other groups, we are attempting to accurately annotate structural genomic variants in each individual being sequenced, using primarily short, approximately 105 bp paired sequence reads. Although we have made significant progress in developing and validating many different computer algorithms, we are still a long ways off. It appears that identifying structural genomic variants is much more difficult that identifying SNPs.
You are involved in a number of projects. What's occupying most of your time at the moment?
I would say that currently three-quarters of the research going on in our research laboratory revolves around some form of structural variation identification or characterization, either of humans or model organisms. I have been warned by some to take on fewer projects and focus on a few that are of high priority and potential impact. It has been difficult to do so. There is still so much to be learned about structural genomic variants in both humans and model organisms — where the variants are located, what genomic impact they have, and what subsequent phenotypes result from these variants ,including disease susceptibility, and I am passionate about pursuing this knowledge to the full extent of the time and resources available. Some colleagues of mine have told me that they suspect that I also have a touch of attention deficit disorder. I think that they are correct.
[ pagebreak ]
What kinds of bioinformatics tools have you been using to answer these research questions?
Until recently, we have been very dependent on software produced by the various array production companies. We have also been using software from other companies that are developing platform-independent analysis algorithms, such as the Nexus program from BioDiscovery. However, last year, we were fortunate to recruit Ryan Mills to our group, who is a senior bioinformatician and team leader, with interests in analyzing next-generation DNA sequencing data and identifying genetic variants. He has been invaluable to our research unit. I can’t imagine how any genomic-based research or clinical laboratory can manage without access to talented bioinformaticians. I know that there is a shortage of clinical cytogeneticists in North America, but I suspect that there is a higher demand for bioinformaticians these days.
If you were to think about the cytogenetics community as a whole, is it just elite labs that are adopting this technology?
I would say that two years ago it was just the elite labs. But that's changing. Just off the top of my head, I suspect there are at least 20 or more laboratories in this country that have now adopted array-CGH in some form. As a program committee member for the American Society of Human Genetics, I have seen many more abstracts on aCGH findings this year than in previous years. The program for our upcoming meeting in Hawaii will undoubtedly be an exciting one — in part due to the fact that aCGH is no longer limited to elite laboratories, but is now widely used. It truly is an exciting time for cytogeneticists.