NEW YORK (GenomeWeb) – Scientists at 10X Genomics and Stanford University have used the company's linked-read sequencing technology to haplotype human germline and cancer genomes from nanograms of input DNA.
In a study published in Nature Biotechnology today that was originally submitted last May, they demonstrated the use of the company's GemCode platform to generate haplotype blocks for a family trio, to phase a set of structural variants, to resolve the structure of a gene fusion in a cancer cell line, and to assign genetic aberrations in a primary colon cancer sample to specific haplotypes. Some of the data were presented last summer during a company-sponsored webinar.
"This comprehensive genome-scale view can be used to identify causative germline variants in heritable disorders and provide a deeper understanding of the genomic alterations underlying tumor development and maintenance in cancer patients,” said Hanlee Ji, senior author of the study and an associate professor of medicine at Stanford, as well as a director of the Stanford Genome Technology Center, in a statement.
10X Genomics uses microfluidic technology, first publicly presented a year ago, to distribute nanogram amounts of genomic DNA across more than 100,000 droplets, where it is combined with barcoded primers in gel beads, followed by amplification. After being released from the droplets, the DNA undergoes library preparation for sequencing on an Illumina sequencer. Subsequently, the barcodes are used to link sequence reads that originated from the same DNA molecule computationally, allowing researchers to phase genome variants.
To assess the performance of the technology, the scientists analyzed samples from a HapMap parent-child trio — NA12878, NA12877, and NA12882 — using about 1 nanogram of DNA from each sample. After processing them on a 10X GemCode instrument and preparing sequencing libraries, they sequenced the genomes to 30x mean coverage on an Illumina HiSeq 2500.
They obtained a mean of 15 linked reads per DNA molecule and determined that the original molecules were at least 40 kilobases and up to 200 kilobases in size. Overall, they phased more than 95 percent of single nucleotide variants in all samples, with N50 phase block sizes ranging in size from 0.8 megabases to 2.8 megabases.
The linked-read data could also be used to phase de novo variants, they showed, but this requires more coverage to achieve parity with standard library prep methods due to coverage biases against GC-rich regions.
After enrichment, they also sequenced the exome of the three barcoded samples at greater than 185x depth and found that the haplotype blocks they discovered were consistent with Mendelian inheritance across the family trio. In addition, they used the linked-read data to call breakpoints in large-scale structural variants and to assign them to specific haplotypes.
Next, the researchers analyzed a lung cancer cell line, NCI-H2228, that is known to contain an EML4-ALK fusion. After preparing a barcode sequencing library from 1 nanogram of DNA, they enriched the exome and sequenced it to an average coverage of 204x. From that, they correctly identified the fusion and inferred a refined structure for the structural rearrangement of the ALM and EML4 genes.
Finally, they studied a primary colon adenocarcinoma, in which they identified mutations, copy number variants, and rearrangements. Using 1 nanogram of DNA from each of the tumor and normal samples as input, they generated barcoded sequencing libraries and sequenced them to an average coverage of 30x. Combining somatic mutations, haplotype blocks, and barcode counting identified the trans-relationship between a mutation in TP53 and a chromosome 17p loss.
"To our knowledge, this is the first study to demonstrate a droplet-based system for whole-genome phasing and structural variant analysis," the authors wrote. "In addition to phasing and structural variant calling, linked reads can potentially also be applied to de novo genome assembly, remapping of difficult regions of the genome, detection of rare alleles, and elucidation of complex structural rearrangements."
"The identification of potentially pathogenic mutations and structural variants remains a challenge, and linked-read sequencing provides an opportunity to improve the understanding of diseases such as cancer," they concluded.
At the JP Morgan Healthcare Conference last month, 10X Genomics said it has sold 43 GemCode platforms so far and plans to launch library kits for whole-genome and exome sequencing in the near future, as well as a new version of its platform that will be compatible with the Illumina HiSeq X Ten system.