NEW YORK (GenomeWeb) – Sequencing technology company 10X Genomics this week released data generated by its GemCode platform on reference samples from the National Institute of Standards and Technology. The company has submitted the data to NIST's Genome in a Bottle Consortium, which is seeking to generate standards for next-generation sequencing.
In a webinar, 10X Genomics discussed the data, some of which it has made available online, as well as data from some of its early access users. The firm launched the GemCode platform at the Advances in Genome Biology and Technology meeting in Marco Island, Florida in February.
The GemCode platform is designed to be used in conjunction with a short-read sequencer — currently, it is compatible with the Illumina HiSeq 2500 — and uses droplet microfluidic technology prior to sequencing to generate "linked reads" that can be used to piece together haplotypes.
The core technology is what the company refers to as a GEM — a gel bead inside emulsion — essentially "an oligo delivery device," Mike Schnall-Levin, VP of computational biology and applications, said during the webinar.
During the webinar, Schnall-Levin discussed experiments the company has done comparing the GemCode platform with standard Illumina sequencing. One goal of many researchers has been to load as little DNA as possible onto the sequencer, particularly for clinical samples where sample may be limited. One problem though, is that with less input DNA, there is an "explosion" of PCR duplication, Schnall-Levin said, which causes bias.
Using the GemCode platform and 1.2 ng of input DNA, PCR duplication rates are between 1 percent and 2 percent, he said. To get comparable performance using the standard Illumina TruSeq sequencing protocol, one would need to start with 200 ng of DNA, he said.
For the Genome in a Bottle samples, Schnall-Levin said the company is focusing on two stages — the pilot sample, known as NA12878, and an Ashkenazi trio. The data is available online.
Starting with 1.2 ng of DNA, the company generated DNA molecules with a mean length around 130 kb. Over half of all DNA molecules were longer than 100 kb and around 89 percent of molecules were longer than 20 kb.
Sequencing, which was done on the Illumina HiSeq 2500, generated 1.3 million reads, 96.7 percent of which were mapped. Sequencing depth was 34x. PCR duplication was 1.5 percent, and around 8.5 percent of bases were not covered.
The company was able to phase 96.2 percent of SNPs into haplotype blocks. The haplotype block N50 was around 17 mb, while the longest haplotype block was nearly 40 mb. The company obtained comparable results from the other NIST samples, as well. In addition, Schnall-Levin said, the haplotype block length was limited to 40 mb for all samples because of the software, not the technology itself.
Schnall-Levin highlighted a few genes for which having phase information was critical. In the MYPN gene, a 100-kb gene that has been implicated in cardiomyopathy, sequencing identified two missense mutations about 7.5 kb apart. The GemCode platform revealed that the two mutations were on the same copy of the gene, meaning there was still a second copy identical to the reference.
In a second gene, MEFV, which is associated with Mediterranean fever, the sample again had two missense mutations, but this time the mutations were on different copies of the gene. "If they were deleterious mutations, you would have knocked out both copies of the gene," Schnall-Levin said, which could have important clinical implications.
Next, Schnall-Levin illustrated data from the child of the Ashkenazi trio. The child had three mutations on the CDH23 gene, a 419-kb gene implicated in deafness. The GemCode platform showed that two of those mutations were on one copy and one mutation was on the second copy. This phase information could then be confirmed via the sequence data from the parents.
The GemCode platform was also able to identify and phase structural variants. For instance, the data clearly showed a 50-kb deletion present in the child. "You can see the child has a mix of two haplotypes," Schnall-Levin said. The 50-kb deletion was only located on one arm, and the company confirmed it using the sequence data from the mother and father.
The company also highlighted several results it has generated in collaboration with some early customers. 10X Genomics has been collaborating with Hanlee Ji, an assistant professor of medicine at Stanford University, on structural variant discovery in tumor DNA. Ji used fresh frozen DNA from a previously characterized sample. The sample had a known copy number variant, but researchers had not been able to place that variant within the genome. The 10X Genomics data, however, "showed clearly a tandem duplication in a cis relationship," Schnall-Levin said. Ji declined to comment due to a forthcoming publication.
David Jaffe, who was previously at the Broad Institute but recently joined 10X Genomics, used the company's technology to de novo assemble a tumor genome. Jaffe, who reported on data at the AGBT conference in February, used the technology to piece together four molecular populations within the same tumor genome, Schnall-Levin said. Linked reads from one barcode are first used to define the pathway — using only those reads that can be placed uniquely. Then, the reads with the same barcodes as those that define the path, are placed over that initial pathway to "disambiguate the ambiguous parts of the path," Schnall-Levin said.
Stephan Schuster, a professor at the Nanyang Technological University in Singapore, told GenomeWeb that his lab has worked with 10X Genomics to generate data, which has "looked terrific." He said the N50 haplotype blocks were "above our expectations."
In the future, he thinks the technology could be a standard addition to whole-genome sequencing. "Roughly, another $500 gives you a fully phased high-quality genome," he said. The analysis is also straightforward, he said, with no major obstacles. Anyone who already has a pipeline for SNP calling from Illumina sequence data can handle the 10X Genomics analysis, he said.
Schuster anticipated that 10X Genomics would give other long-read sequencing technologies a run for their money. Schuster said his lab also runs Pacific Biosciences' RS II system, and while that system delivers very long reads, it cannot compare on price. "The up-front investment and the actual sequencing cost is completely outside of what's applicable to normal, routine, human genome sequencing," he said. He estimated that sequencing a whole human genome on the RS II would cost several hundred thousand dollars. By contrast, a human genome using the GemCode and a HiSeq X Ten could cost around $2,000, he said.