SAN FRANCISCO (GenomeWeb) – Two research groups, from the Wellcome Sanger Institute in the UK and the Garvan Institute of Medical Research in Sydney, Australia, have independently analyzed BGI sequencing technology for single-cell RNA sequencing, finding that the data is comparable to data generated using Illumina technology.
Both teams collaborated with BGI researchers and compared sequencing data produced at BGI with data they generated in house on Illumina instruments.
The Sanger team, along with researchers from BGI, tested two single-cell RNA-seq protocols on the Illumina HiSeq 4000 and BGISEQ-500, describing its results this week in Genome Biology.
The Garvan and BGI team analyzed single cells processed by 10x Genomics' platform, sequencing them on the MGISEQ-2000 as well as Illumina's NextSeq 500 and NovaSeq 6000. That work is described in a paper published in February on the BioRxiv preprint server.
Overall, both research groups found the BGI sequencing data comparable with Illumina sequencing data and estimated that costs were lower on BGI's instruments based on available list prices.
The Sanger team estimated cost per gigabase of sequence data to be $20 on the BGISEQ-500 when running paired 100-base reads and $44 on the HiSeq 4000 with paired 100-base reads.
The Garvan team estimated that cost per million reads on the MGISEQ-2000 was A$2.53 ($1.82), while on the NextSeq it was A$12.53 and on the NovaSeq it ranged from A$4.35 to A$5.59, depending on the flow cell.
Illumina declined to comment on the results of the studies.
Kedar Natarajan, lead author of the Sanger study and now an assistant professor at the University of Southern Denmark, said that the work is a continuation of previous research that was started while he was a postdoctoral researcher at the Sanger Institute. That work compared various single-cell RNA sequencing protocols and was published in Nature Methods in 2017.
Natarajan said the main takeaway from the Genome Biology study is that the two platforms are comparable. He noted, however, that the Sanger researchers did not produce the sequencing data on BGISEQ-500. That was produced by BGI scientists in Shenzhen, while researchers from both organizations generated sequencing data on Illumina platforms.
In the study, the researchers created libraries using both the SMARTer and Smart-seq2 protocols and sequenced samples on the Illumina HiSeq 4000 and the BGISEQ-500 instruments. For the samples that were run on the BGISEQ-500 instruments, both BGI and Sanger Institute researchers performed the sample prep on matched samples, but the Sanger team then sent its own prepared samples to BGI, where all the sequencing was performed, Natarajan said.
For the samples sequenced on Illumina technology, both Sanger and BGI researchers prepared libraries and sequenced matched samples at their respective institutions, Natarajan said. The goal was to make sure that any variability seen between the sample prep protocols and sequencing technologies was not due to where the experiments were being done, he added.
The team did see some RNA degradation in a few of the single-cell libraries, which the authors attributed to the shipment of those samples. But, overall, "we didn't see a big variation due to the differences in handling the cells or preparing the libraries," Natarajan said.
In total, the researchers analyzed 1,297 matched cDNA samples from 468 single cells using two single-cell RNA-seq protocols. They incorporated spike-in samples in order to have a truth set to validate against.
The team found that fragment size distribution, read coverage over genes, dropout rates, and expression variation were similar for both sequencing platforms. They used the spike-ins to measure accuracy and sensitivity and found the metrics were similar across sequencing platforms, regardless of the single-cell protocol used. Detection limit ranged from 21 to 47 RNA molecules, with the limit being slightly lower for the BGISEQ-500 platform because those samples were sequenced at higher depth. When the team down-sampled the data to compare at similar sequencing depths, there was no difference. Also, when analyzing the data at 1 million reads per cell, the team found similar gene expression patterns from the two platforms.
In analyzing "two different cell types, two different single-cell and sequencing technologies, and two different spike-ins, what we see is that the performance metrics — sensitivity of detecting an RNA and the accuracy — are comparable," Natarajan said.
The cost of sequencing on the BGISEQ-500 was around 40 percent to 60 percent less than sequencing on the HiSeq 4000, the researchers found.
He added that going forward, it would be important for additional studies to confirm the findings and for the BGI instrumentation to be evaluated outside of BGI.
Meantime, Joseph Powell, head of single-cell and computational genomics at the Garvan Institute, said that his team "did a fairly comprehensive analysis, and the take home message was that essentially, over a wide range of modalities of data comparison, the BGI system was almost directly comparable to NovaSeq." He noted, however, that BGI ran the samples at its own lab and that it would be important for researchers to test the systems externally, as well.
Powell added that while his lab does not currently own any of BGI's instruments, it is considering bringing the technology in house. In addition, he said, the researchers have so far only evaluated data from single-cell sequencing on the instrument.
"The main consideration is how this will fit into our workflow," he said, including the lab's LIMS, robotics, and other processes. "I personally favor having a range of technologies available so that we won't be a uni-platform [lab], and that's for a whole range of reasons."
In the BioRxiv study, the researchers analyzed more than 70,000 single cells. The Garvan researchers prepared all the libraries, generating three sets of paired single-cell libraries to compare the MGISEQ-2000 with the NovaSeq and two experiments comparing MGISEQ-2000 with NexSeq 500.
Overall, the researchers found that the data from MGISEQ-2000 and NovaSeq were comparable, while the MGISEQ-2000 performed better than the NextSeq 500, identifying more cells, genes, and unique molecular indexes. Powell said that the NextSeq read quality tailed off over the length of the read, so "you have less reads that can be assigned to a cell, less that can be mapped back to the UMI, and so an overall decrease in the mapping efficiency." He did note that since doing the study, a newer version of the NextSeq chemistry has come out, which might make a difference in performance.
Powell echoed Natarajan's thoughts that going forward, it would be important for the BGI instruments to be tested outside of BGI, and said that he thought it was a good thing to have an alternative technology on the market.
"It's important for the community to see this work and to do this work in a manner that's deep enough to evaluate as many of the potential nuances in this type of data as possible," Powell said.
Thus far, most of the published studies evaluating BGI's sequencing technology have relied on sequence data produced by BGI, including a study published last year by another group from Australia evaluating the technology for sequencing cancer genomes. However, BGI's instrument division MGI Tech said in February that it plans to begin selling instruments in the US and Europe by the end of the year. Already, however, Illumina has sued a BGI subsidiary in Germany for patent infringement.