By Monica Heger
Researchers at BGI have published two papers in Cell outlining a single-cell exome sequencing technique that they demonstrated on cell lines, a previously sequenced genome, and cancer patient samples.
The papers show that single-cell analysis can provide a much finer-grained genetic characterization of heterogeneous tissues than bulk tissue sequencing and also point toward the use of the method in areas beyond cancer, such as stem cell research and preimplantation genetic diagnosis, according to the BGI researchers.
Moving forward, the team plans to improve the technique and use it to analyze single cells from different cancer types to study "metastasis, recurrence, and [tumors] before and after therapy," Luting Song, project manager at BGI and a co-author on the papers, told In Sequence in an e-mail.
Single-cell sequencing is thought to be especially useful in cancer samples because tumors are heterogeneous and bulk sequencing may miss rare clonal types. Additionally, sequencing individual cells could help study tumor evolution or be used to monitor relapse by sequencing circulating tumor cells in patients' blood.
However, sequencing single cells is tricky because it is difficult to capture the entire genome of a single cell and amplification strategies introduce bias. Nevertheless, a number of researchers and companies have been working on strategies to sequence single cells, including a team from Cold Spring Harbor Laboratory, which developed a strategy that uses degenerate oligonucleotide-primed PCR to amplify the genome (IS 5/18/2010), and Rubicon Genomics, which uses a technique called thermal cycle library formation that enables the creation of multiple copies of a library from one template strand (IS 3/22/2011).
The BGI team described its method in two papers published in Cell last week. In one, the researchers first validated the approach in two lymphoblastoid cell lines, and then performed single-cell exome sequencing on 90 cells from a patient with a myeloproliferative tumor.
In the second paper, the team demonstrated the single-cell exome sequencing method on 25 cells from a patient with clear cell renal cell carcinoma.
The BGI method relies on multiple displacement amplification, which uses the enzyme phi29 to amplify the DNA in a linear fashion. According to Song, compared to the degenerate oligonucleotide-primed PCR method, MDA generates larger amplicons — on average 10 kilobases compared to 1 kilobase with DOP — which "results in significantly higher genome recovery" and "allows greater resolution."
The greater resolution of MDA enables single-nucleotide variants to be called, while DOP can only reliably call copy number variations, Song added. The CSHL team that initially published the DOP method is now using it to analyze copy number variation from prostate cancer patients.
However, DOP tends to have less amplification bias, and Song said that a promising strategy is to combine the two approaches, which would enable analysis of single-nucleotide variants, copy number variation, and potentially other types of variation.
BGI researchers have already been using a single-cell sequencing strategy that incorporates both MDA and DOP, and at last year's American Association for Cancer Research meeting, BGI America's CEO Xun Xu presented preliminary data from single-cell exome sequencing of 400 renal cell carcinoma cells. Additionally, Xu said at the time that the team is using the technique to study five different cancer types (IS 4/12/2011).
As described in last week's Cell papers, the BGI team first tested its strategy on two cells from lymphoblastoid cell lines derived from an individual whose genome had previously been sequenced — a diploid Han Chinese genome that BGI published in Nature in 2008. They evaluated the technique for whole-genome recovery, amplification, uniformity, sensitivity, and specificity.
Two cells were randomly selected and extracted from the cell line, and then multiple displacement amplification was used to amplify the whole genomes of each cell.
Next, the PCR products were filtered for genomic content — each was tested for the presence of 10 housekeeping genes — and those that passed went on to library construction for whole-genome sequencing on the Illumina HiSeq.
The BGI researchers performed whole-genome sequencing with 100-base paired reads and a 350 base pair insert and were able to align 42.27 gigabases from one cell and 47.65 gigabases from the other, corresponding to 97.25 percent of bases with 15.88-fold mean coverage and 95.64 percent of bases with 17.9-fold mean coverage, respectively.
As a control, the team did multicell sequencing from the same cell line, aligning around 99.91 percent of the bases at a mean 18-fold coverage of the whole genome.
According to the authors, while this fold coverage is less than what would typically be used to call variants, it was sufficient for their purposes since the object was not to identify mutations, but to assess the efficiency and performance of the whole-genome amplification method.
For both cells, the strategy was able to cover more than 95 percent of the reference genome at 15-fold coverage. Comparing the coding region of the single cells to the multicell control found that 25 percent of the bases in the single-cell techniques were covered at 18-fold or more, which at the same sequencing depth was 50 percent fewer bases than the multicell technique, likely due to "bias in the amplification process," the authors wrote.
Further evaluation found that amplification bias was directly correlated with GC content, with regions of high GC content not amplifying well. In gene-coding regions that were not amplified at all, median GC content was 60.12 percent. Amplification efficiency was not correlated with other factors like chromosome location or homopolymers.
Using the multicell sample as a control, the team next calculated the allele dropout rate of the single-cell technique. Allele dropout would occur if one allele from a heterozygous site was not amplified, resulting in a false negative. For both cells, they found the allele dropout rate was around 11 percent. The false discovery ratio, which would occur because of amplification, hybridization, or sequencing error, was much lower. Comparing a subset of 99,152 well-defined homozygous sites, the team found that only two to three sites were discrepant in each single-cell sample.
The researchers then demonstrated the method on patient samples, first sequencing the exomes of single cells from a 58-year-old male with a type of myeloproliferative tumor known as essential thrombocythemia, or ET, and then, in the second Cell paper, sequencing the exomes of single cells from a patient with kidney cancer.
In the myeloproliferative tumor case, 82 cancer cells were selected from the patient's bone marrow for exome sequencing, along with eight matched normal cells. The exomes of each cell were sequenced to a 30-fold mean depth. Cells with less than 70 percent coverage were filtered out, leaving 58 cells remaining for analysis.
Analysis of the cells pointed to a monoclonal tumor, rather than multiple subclones. The patient had previously tested negative for mutations to the JAK2 gene, which is found in around 55 percent of ET patients, and, as expected, none of the single cells contained mutations to that gene. The BGI researchers did find mutations in several other genes, including NTRK1, which they said was "especially intriguing, given that it is a tyrosine kinase receptor that functions in a similar biological pathway as JAK2."
In the second Cell paper, sequencing the exomes of single cells from a renal cell carcinoma patient suggested that the cancer may be "more genetically complex than previously thought," the authors wrote.
Previous sequencing studies of clear cell renal cell carcinoma have not identified many recurrent mutations between patients. In this study, exomes from 20 tumor cells and five normal cells of a 59-year old Chinese male with clear cell renal cell carcinoma were sequenced. After sequencing, analysis identified that three of the cells thought to be tumor cells were actually normal, highlighting the common problem of contamination from normal cells in cancer sequencing.
Song said that the finding also highlights single-cell sequencing as "a much more robust tool to exclude the influence of the normal cells mixed in the cancer tissue than in bulk sequencing."
As in the case of the myeloproliferative tumor, the team did not identify any subclones within the cancer cells, though they did find that the cancer cells were extremely diverse, with each seeming to have accumulated a plethora of passenger mutations.
In total, the team identified 260 single nucleotide variants. The mutations covered 99 genes, four of which were also mutated in a cohort of 99 kidney cancer patients, although none were mutated in more than 5 percent of the patient cohort.
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.