This story was originally published Dec. 21, 2012.
Scientists at Harvard University have developed a new whole-genome amplification method with high uniformity that enables them to sequence the genomes of single cells to a high degree of completion.
A study published in Science this week shows that the method, called multiple annealing and looping-based amplification cycles, or MALBAC, can be used to cover as much as 93 percent of the genome of a single human cancer cell at 25x mean sequencing depth. From the data, the researchers, led by Xiaoliang Sunney Xie, a professor of chemistry and chemical biology at Harvard, were able to detect copy number variations and, by sequencing three related cells, identify true single nucleotide variants.
They also applied MALBAC to sequencing single sperm cells, allowing them to study meiotic recombination, results they published separately in Science.
The new method is part of a number of efforts by various research groups to improve single-cell genomics approaches, some of which were presented at a conference earlier this year (IS 5/15/2012).
The MALBAC paper is "stimulating work and a nice contribution to the rapidly growing field of single-cell genomics," said Steve Quake, a professor of bioengineering at Stanford University. Earlier this year, Quake's group published its own sequencing and genotyping study of single sperm cells, using multiple displacement amplification, or MDA, and microfluidics (IS 7/24/2012).
"MALBAC opens a door to many critical questions, such as copy number variations in a population of cells, variations in transposon jumping, chromosomal translocations, or, as shown by the authors, how mutations accumulate during cell proliferation," said Bing Ren, an assistant professor of cellular and molecular medicine at the University of California, San Diego. He said he plans to adopt the protocol, though he noted that whole-genome sequencing is still expensive, "so it is hard to do this on a regular basis."
MALBAC "has a major advantage over current [methods] in that it can reliably provide high percentage coverage of the genome in singe-cell sequencing experiments," said Chuan He, a professor of chemistry at the University of Chicago. "I think potential commercialization of this technology could significantly facilitate single-cell sequencing research on various biological processes, as well as future disease diagnostics."
According to Xie, MALBAC could enable many types of single-cell genomics studies that have not been possible to date, including studies of the methylome and chromatin structure. It might also have clinical applications, for example for the analysis of circulating cancer cells.
Shedding light on how genomic variation arises and spreads in populations of cells, the method also contributes to a better understanding of evolution and cancer. "Being able to study single cells and measuring many different cells, we will get clues about mutation rate and the genesis of cancer," Xie said.
Because there is no single-molecule sequencing method available yet that can analyze an entire human genome, single-cell sequencing studies have to rely on whole-genome amplification.
However, current WGA methods, such as PCR-based approaches or MDA, have biases and amplify DNA in a nonlinear fashion, leading to uneven amplification and coverage of a genome.
MALBAC reduces this bias by using quasilinear amplification instead. "We have very even amplification because we use linear amplification," Xie said. "PCR, as well as MDA, use exponential amplification, so if there is any preference [for certain regions of DNA], it will just get amplified very rapidly."
Briefly, a pool of random primers is hybridized to picograms of single-stranded template DNA from a single human cell. Next, a polymerase with strand-displacement activity generates semiamplicons, ranging in length from 0.5 to 1.5 kilobases. These semiamplicons are melted off and amplified further into full amplicons with complementary ends, which form a loop and are no further accessible as templates for amplification. After five cycles of such linear preamplification, full amplicons are exponentially amplified by PCR, yielding micrograms of DNA for sequencing.
For their study, the researchers used MALBAC on single cells from a human cancer cell line and sequenced the amplified DNA at 25x mean depth. They were able to cover between 85 percent and 93 percent of each cell's genome with at least 1x depth on either strand. MDA of a single cell and sequencing, by comparison, only covered 72 percent of the genome, and the coverage was less uniform than with MALBAC.
Because of MALBAC's more uniform amplification, the scientists were able to determine copy number variations across the genomes of three individual cells, and found cell-to-cell differences. "It really points to the fact that our genome is dynamic, it changes with time," Xie said.
They also called single nucleotide variants from the data, which required them to work in a clean room in order to prevent contamination with foreign DNA. In the future, Xie said, this work could be conducted inside of a microfluidic device instead.
By comparing the number of SNPs in single cells to that in bulk cells, they estimated that they were able to detect about three quarters of SNPs present in the single cells, more than an MDA-based approach, which only detected about 40 percent of SNPs. They also found that as a consequence of allele dropout, they incorrectly classified about 1 percent of SNPs as homozygous that were in fact heterozygous.
The number of false positive SNPs was high, which is due to errors made by the polymerase during the preamplification phase. To overcome these errors, the scientists sequenced three cells that were derived from the same precursor cell separately and compared their SNPs. This allowed them to reduce the number of false-positive SNPs by more than 100,000: "If all three of them had the same SNP, then we know that it is unique," Xie said.
In an additional step, they filtered out all false positive SNPs that were due to systematic sequencing and amplification errors by comparing two unrelated single cells.
In order to estimate the mutation rate in the cancer cell line, they allowed a single cell to propagate for 20 generations and then sequenced the entire population as well as a single cell and three of its descendents. They detected 35 unique SNPs in that cell, which corresponds to a mutation rate of 2.5 SNPs per cell generation, which they said is consistent with estimates based on bulk sequencing data. "We were able to measure these newly acquired SNPs — only single-cell measurements can do this," Xie said.
According to Nick Navin, an assistant professor at the MD Anderson Cancer Center, MALBAC's main weakness is its high SNP error rate. Reducing the error by sequencing three related cells will not always be possible, because the cells need to be cultured for that. "Thus I think the main utility of MALBAC will be for copy number profiling," he said, though commercial solutions already exist that work well for that at 50-kilobase resolution.
"The main advantage of MALBAC is the high coverage, which in theory would allow it to achieve higher copy number resolution, perhaps even at 1 kb, but the authors did not show this in the paper," said Navin. In 2011, he and his colleagues published a paper on single-cell sequencing and copy number analysis in breast cancer (GWDN 3/14/2011).
John Nelson, a researcher with GE Global Research, agreed that MALBAC's use of error-prone DNA polymerase is a concern, and that culturing cells to obtain three related cells "can be complicated and in many cases impossible."
Nelson, who helped develop MDA, said that he and his colleagues are working on a "much-improved version" of the MDA reaction under a grant from the National Institutes of Health. He noted that this effort has been going "extremely well" and collaborators plan to publish first results soon.
Xie and his colleagues are now working on making the coverage by MALBAC even better. "There is still some residual sequence bias, [but] I think we can still fine-tweak this to further improve the coverage, so we don't have to sequence three cells but [only] two cells to remove the false positives [SNPs]."
He added that a member of his team is currently exploring the "commercial offering of reagents and service" for MALBAC.
One of their first applications of MALBAC was to sequence individual human sperm cells. For that study, the scientists sequenced 99 individual sperm cells from a single person, as well as his diploid genome. They then used the data to phase his genome and to map meiotic recombination events, which they found to be rare near transcription start sites. They also found that crossovers were less frequent in aneuploid cells, suggesting that a failure to form crossovers in meiosis leads to chromosome segregation errors.
Xie and his team are applying MALBAC to many other projects now. One of them is to sequence circulating cancer cells in blood, which he said could have future diagnostic applications.
They have also sequenced the genome and the transcriptome of a single cell simultaneously by separating the nucleus and the cytoplasm prior to sequencing, results they plan to publish soon.
In addition, they plan to correlate copy number variants and transcriptome information. "Seeing this at the single-cell level really allows us to study the mechanism of CNVs and also characterize the effect, the correlation between genome and transcriptome. It's information that was not available before," Xie said.
In addition, he said, MALBAC could offer solutions to single-cell methylome and single-cell chromatin interaction studies. "It's going to be very exciting to see how it applies to these different areas, not only the genome and transcriptome," he said.