NEW YORK (GenomeWeb) – Analyzing genomes of single cells as opposed to several cells in bulk could help tease apart genetic heterogeneity and cellular interactions, but research on single cells has been limited in scope due to the cost of sequencing or library generation, and the difficulties in isolating single cells. However, two research groups, from the University of Washington and Oregon Health & Science University, have now developed methods for analyzing single cells using combinatorial indexing that is scalable to thousands of cells and does not rely on ancillary equipment.
While both teams' methods, which were published today in Nature Methods, rely on combinatorial indexing, the OHSU group used it in conjunction with whole-genome sequencing of single cells while the UW researchers applied it to single-cell Hi-C sequencing, which determines the three-dimensional structure of genomes.
Jay Shendure, professor of genome sciences at UW and senior author of the Hi-C paper, said in an interview that both studies built on a collaboration between his group and Illumina. The two developed a combinatorial indexing technique using transposase, known as CPT-seq, that they published in Nature Genetics in 2014. At the time, Andrew Adey, lead author of the other Nature Methods study published today, was a graduate student in Shendure's lab.
The premise behind combinatorial indexing is that a series of barcoding and dilution steps eventually results in a situation where single cells will be uniquely tagged, without having to physically separate the cells. Microfluidics equipment is not needed and the method is scalable, Shendure said, adding that the same combinatorial indexing strategies used in these papers could be applied to other single-cell sequencing techniques.
In his study, the researchers used combinatorial indexing to sequence 10,696 cells. To do this, the researchers started with a population of between five million and 10 million cells, which they fixed and lysed to generate nuclei, followed by in situ restriction digestion. The nuclei were then distributed to a 96-well plate and a barcode was applied through ligation of biotinylated double-stranded bridge adaptors. The nuclei were then pooled, ligated, diluted and redistributed into a second 96-well plate. No one well contained more than 25 nuclei. After another round of lysis and barcoding, the number of barcode combinations was 96 by 96, which exceeded the number of total nuclei, giving each nucleus a high probability of being tagged by a unique barcode combination, the authors wrote.
The material was then pooled, purified, digested, and turned into Illumina sequencing libraries. Sequencing was performed using 2x250 base pair reads, long enough to perform the Hi-C analysis as well as to identify the barcode combinations.
The UW team demonstrated the protocol on mixtures of cells derived from two mouse cell lines as well as three human cell lines.
Shendure said that a previous single-cell Hi-C protocol was published a few years ago by Peter Fraser's group from the Babraham Institute in the UK. "That work was terrific," he said, "and shows what you can do with single-cell Hi-C data, but one challenge is that it's technically difficult and labor intensive." By contrast, combinatorial indexing does not require a specialized instrument and does not rely on isolating single cells, which helps mitigate batch effects, he said.
He added that his group plans to use this protocol to study genome conformation within the nucleus. The "long-term goal is to model the nucleus," he said, and "operating at the level of single cells will be important."
In the related study, also published today in Nature Methods, Adey's team from OHSU used the combinatorial indexing approach to sequence whole genomes of single cells. Adey, an assistant professor at OHSU, said in an interview that his group's long-term goal is to use the method to study heterogeneity in cancer.
In the study, they constructed libraries for a total of 16,698 single cells from cell lines, primate brain tissue, and two human adenocarcinomas.
The key hurdle in adapting the method for sequencing DNA, Adey said, was to remove nucleosomes bound to DNA without disrupting the nuclei. In combinatorial indexing, nuclei are barcoded and sequencing is performed on chromatin. But much of the DNA in native chromatin is bound by histones and inaccessible for sequencing unless the histones are removed.
To do that, he said, the group developed two strategies. In one, they used lithium salt to unbind the nucleosomes. The lithium-assisted nucleosome depletion (LAND) method resulted in slightly biased coverage, but a lot of sequencing reads, Adey said. A second strategy — crosslinking with SDS (xSDS) — used a crosslinking approach and detergent to denature the nucleosome and resulted in more even coverage but fewer reads.
Adey said that the group is now working to improve both methods. For instance, for the crosslinking approach, the researchers are looking to boost the number of reads by using different crosslinking agents and washes.
The team first tested both the LAND and xSDS strategies on a cell line that has been extensively studied. They found that both enabled the nuclei to stay intact, but the LAND method resulted in 1.7 billion unique reads, while xSDS resulted in 798 million unique reads.
Next, they combined the approaches with single-cell combinatorial indexing, first sequencing a lymphoblastoid cell line. They generated six libraries using the LAND method and one xSDS library. Coverage uniformity was around 1.57 times better using xSDS. They then called the copy number variants and found that the LAND libraries had a high rate of such variants at 61.9 percent, "suggesting an abundance of false positives due to lack of coverage uniformity," the authors wrote. The xSDS strategy resulted in an aneuploidy frequency of 22.6 percent, "much closer to the results of karyoptying.”
They also tested their strategies on a tumor sample. From a stage III pancreatic ductal adenocarcinoma, the researchers ran the xSDS protocol, generating 1,715 single-cell libraries sequenced to a median unique read count of 49,272 per cell. Performing a copy number analysis of the cells, the researchers identified three clusters as well as copy number segments shared between the clusters that suggested they arose from a common progenitor. The researchers also identified cluster-specific CNVs.
Adey said that in the future, his group plans to continue to focus on developing the method and applying it to study cancer samples. The team is also considering how to apply the method clinically. In the analysis of the pancreatic tumor, Adey said, the group found "three distinct subclones with focal amplifications and deletions of genes relevant to cancer." Those findings suggest that in the future, the technique could be applied to profile tumors and "tailor treatment around the subtypes," he said.