SAN FRANCISCO (GenomeWeb) – Researchers at BGI, the China National Institutes for Food and Drug Control, and the State Food and Drug Administration have published whole-genome sequence data from the BGISEQ-500 instrument, evaluating its ability to sequence a human cell line and comparing the results with data generated on the Illumina HiSeq 2500 instrument.
BGI launched the BGISEQ-500 system, which harnesses the DNA nanoball technology originally developed by Complete Genomics, in China in 2015 and registered it with the China FDA last year.
BGI both sells the system and offers sequencing services on it out of its laboratory. The nonprofit China National GeneBank, founded last year by the Chinese government and BGI, runs 150 BGISEQ-500 instruments.
Also last year, a group from Saarland University in Germany evaluated BGI-generated microRNA sequence data on the instrument, but now, the BGI team has published the first whole-genome dataset in the journal GigaScience.
In the study, the researchers sequenced the widely studied cell line NA12878. They performed two runs using 50 base paired-end reads and two runs with 100 base paired-end reads on the BGISEQ-500, and compared those data to the HiSeq 2500 using 150 base paired-end reads.
Andreas Keller, chair of clinical bioinformatics at Saarland University who last year tested the BGISEQ-500 for microRNA sequencing, said that the new whole-genome data is promising. "It is good to see this technology maturing," he said. "Researchers will thus have a greater choice of platforms for projects that require next-generation sequencing."
Stephan Schuster, professor at Nanyang Technical University in Singapore, added that it is "very positive and good for the field" that the BGI team has developed the platform to the point where it can "produce enough data for human genome sequencing."
While the BGISEQ-500 is based on Complete Genomics' technology, BGI made some modifications. For instance, although library construction still relies on DNA nanoballs, BGI switched from using four adapters to a one-adapter method, which simplified the library construction process, Xin Liu, lead author of the GigaScience study and bioinformaticist at BGI, said.
In addition, the researchers modified the core sequencing technology from a combinatorial probe-anchor ligation method to a combinatorial probe-anchor synthesis method. That switch is what enabled them to achieve the longer read lengths of 50 bp and 100 bp compared to the 35 bp read lengths on the Complete Genomics platform, Liu said.
The workflow for the BGISEQ-500 involves first fragmenting and size selecting the DNA. Fragments are then repaired and the 3' end is modified to include a dATP. An adapter sequence is then ligated to both ends and the fragments are amplified and circularized to obtain a single-strand circular DNA library. These libraries are then amplified to create DNA nanoballs. The BGISEQ-500 workflow also includes an automated sample prep machine, BGIDL-50. After sample prep, the DNA nanoballs are loaded onto patterned array flow cells for sequencing. According to BGI, the cPAS chemistry works by attaching a fluorescent probe to a DNA anchor on the DNA nanoball, and synthesis is captured using digital imaging. After sequencing, base calling is performed by Zebra call, software designed specifically for the instrument.
The BGI team compared the performance with that of the HiSeq 2500 on the same cell line and used the GATK pipeline for variation calling for both instruments.
The BGISEQ-500 generated just over 200 gigabases of data on the four lanes including nearly 119 gigabases from the two lanes of 50 bp reads and just over 113 gigabases from the two lanes of 100 bp reads. The two platforms were comparable across a number of metrics, including mapping rates, coverage, and sequencing depth.
Unique mapping rates were slightly lower on the BGISEQ-500, likely due to the shorter read lengths, the authors noted.
Read quality was slightly lower on the BGISEQ-500 platform, but still good, with more than 96 percent reads above a quality score of 20 and more than 87 percent of reads above a quality score of 30.
The two platforms compared well for SNP calling as well, with both calling just over 3 million true positive SNPs. Sequencing with 50 bp reads resulted in many more false positive SNPs compared to the 100 bp reads on the BGISEQ-500. Using 100 bp reads, the platform had 6,900 false positive SNPs, compared to 4,300 on the HiSeq 2500, and the BGISEQ-500 had 121,000 false negative SNPs compared to 108,000 on the HiSeq 2500.
Overall sensitivity and precision on the BGISEQ-500 using 100 bp reads was 96.20 percent and 99.78 percent, respectively, compared to 96.60 percent sensitivity and 99.86 percent precision on the HiSeq 2500.
Indel calling on the BGISEQ-500 was significantly worse than on the HiSeq 2500. Using the 100 bp reads, BGISEQ-500 called more than 326,000 true indels and more than 22,000 false positive indels, and missed just over 42,000 indel calls. The HiSeq 2500, meantime, called more than 355,000 true indels, had just under 8,000 false positives, and missed more than 13,000 indels.
Liu said that these differences were likely due to the differences in read lengths, as performance was worse when evaluating 50 bp reads on the BGISEQ-500.
"We had similar accuracy for SNPs," he said, "but since we are comparing 100 bp reads to 150 bp reads from Illumina sequencing, we think that's the reason we are less accurate for the indels," he said.
Liu added that the researchers are working to continue improving the system's accuracy, both by increasing the read lengths and also the quality.
Schuster agreed with the BGI team's assessment of the quality. "For SNP variation, it's quite capable," he said. "But, as they showed, the technology suffers from the shorter read lengths. … If they manage to increase the read lengths to at least 150 bp, they will be much more competitive with Illumina products."
Schuster added that it would be interesting to see how the platform develops. For instance, he said he would be interested to see how BGI makes use of the DNA nanoball technology to differentiate itself from other commercial systems.
Going forward, Liu said that the firm planned to continue to develop the platform for additional applications. For instance, he said, researchers are currently preparing a publication demonstrating RNA sequencing, while internally the BGI team has done exome sequencing. In addition, he said the group is working to enable ChIP-seq, ATAC-seq, and other epigenomic applications.
The instrument also has CFDA approval, and Liu said that the clinical team is working to develop specific clinical assays on system, including its noninvasive prenatal test, NIFTY, as well as cancer assays.