Skip to main content
Premium Trial:

Request an Annual Quote

Genome In a Bottle Consortium Developing Tumor-Normal Reference Materials

Premium

NEW YORK – The Genome in a Bottle Consortium (GIAB) is charting new territory by generating genomic benchmarking datasets for matched tumor-normal samples consented for public use and dissemination.

The goal of the effort, for which pilot data were published in a BioRxiv preprint last month, is to provide a resource for researchers developing new tools and methods for cancer somatic variant detection.

"This is a new area that we are focusing on in the Genome in a Bottle Consortium," said Justin Zook, a researcher at the National Institute of Standards and Technology and a coleader of GIAB. "We hope this is a good and open public resource that is useful to the community."

Previously, GIAB primarily focused on producing genomic reference datasets for healthy individuals, including structural variants and medically relevant genes.

To generate the tumor-normal dataset, the GIAB researchers deployed 13 whole-genome measurement technologies, including short- and long-read PCR-free whole-genome sequencing, single-cell WGS, Hi-C short-read WGS, optical genome mapping (OGM), and cytogenetic analysis, to characterize a tumor cell line and corresponding normal tissue samples from a patient diagnosed with pancreatic ductal adenocarcinoma. Labeled as HG008, the patient was a 61-year-old female of European genetic ancestry who was recruited at Massachusetts General Hospital.

Specifically, the consortium performed PCR-free short-read WGS using the NovaSeq 6000 platform from Illumina, the Aviti sequencer from Element Biosciences, Pacific Biosciences' Onso platform, and the UG 100 from Ultima Genomics.

For long-read WGS, they employed PacBio HiFi sequencing using the Revio platform as well as standard-length and ultra-long-read sequencing using Oxford Nanopore Technologies' PromethIon device.

For bulk Hi-C short-read WGS, they performed Phase Genomics' CytoTerra and Arima Genomics' Arima-HiC assays using Illumina sequencing.

Additionally, to better understand heterogeneity among the normal and tumor cells, the researchers conducted single-cell WGS using BioSkryb Genomics' ResolveDNA kit paired with both Illumina low-pass sequencing and Ultima high-throughput sequencing.

Lastly, the GIAB team carried out OGM using Bionano Genomics' Saphyr system as well as cytogenetic analysis using G-banded karyotyping and directional genomic hybridization.

Despite the myriads of methods used, Zook said, the goal at this point is not to evaluate or compare different technologies but rather to generate comprehensive and reliable data for future benchmarking efforts. To produce high-quality data, the GIAB team worked very closely with the companies that developed the technologies, he added, some of which also contributed data to the study.

The cancer benchmarking data are publicly available on the National Center for Biotechnology Information (NCBI) FTP site and from NCBI's Sequence Read Archive (SRA), Zook noted.

"This is probably the most well-characterized cancer cell line in the world right now," said Giuseppe Narzisi, associate director of computational biology at the New York Genome Center (NYGC) and a coauthor of the GIAB preprint. "The vision is that with all these evaluations and measurements, this cancer cell line will become the state-of-the-art benchmarking cell line that will help the future development of new tools for cancer analysis."

Narzisi's team helped generate some of the Illumina sequencing data for the study and performed variant calling on the dataset using Lancet2, a new iteration of the Lancet somatic variant caller developed by NYGC researchers. The variant calling effort, which represents the next phase of the GIAB project, was not described in the current preprint, he noted.

As the NYGC team helps flesh out the GIAB ​​cancer somatic variant reference, the dataset will in turn help evaluate and benchmark the performance of Lancet2, Narzisi pointed out.

"I'm really excited that they have done this," said Winston Timp, a biomedical engineering professor at Johns Hopkins University who was not involved in the study. "For method developers like me, it is very useful because it means that we have something to refer back to" when testing new approaches.

Timp also praised GIAB's effort to include methylation data from long-read sequencing, which he considers a "really powerful" resource for epigenetic analysis. Transcriptomic data is next on his "wish list" as GIAB continues to expand its benchmarking efforts for the tumor-normal samples, he added.

Moving forward, Zook said, the consortium will dive deeper into the genomic data generated for the HG008 samples and comprehensively characterize the somatic variants, which will be made public as they become available. The team will start with curating a set of somatic structural variants toward forming an initial benchmark, he noted.

While GIAB was able to generate an immortalized cancer cell line for HG008, the team has so far failed to achieve this for the matched normal cells, so this will be a finite resource. Therefore, Zook said that the normal tissue for HG008 will not be publicly available. However, he noted that there is "one more small probability" that the team can establish a cell line from normal cells in the primary tumor, and GIAB plans to attempt this in the coming months.

GIAB is currently working with public cell line repositories to make the HG008 cell line publicly available, he said, though the consortium does not have a firm timeline on that yet.

Besides HG008, GIAB collaborators are working to establish tumor and normal cell lines from another pancreatic patient recruited at Mass General. The team will likely repeat many of the measurements mentioned in the current study on the new patient samples, Zook noted, adding that it took about 12 months to generate, QC, and make the data public for HG008.

The ultimate goal is to have immortalized cell lines for matching cancer and normal cells available, he said, serving as enduring reference samples for the research community to benchmark somatic variants.

Benchmarking cancer somatic variants is posing new challenges for GIAB, Zook said, as tumor cells tend to be heterogenous and can keep mutating over time. One way to potentially control for heterogeneity is growing large batches of cells and analyzing their genomic DNA collectively. While this approach may work with bulk sequencing, it may not be compatible with single-cell analysis, he noted.

Another way the team is exploring is single-cell cloning of the tumor cell line, where individual cells are isolated and cloned in hope for a more homogenous and stable cell line. Still, this method is "probably not a perfect solution" either, Zook said, given that cancer cells can continue mutating as time goes on.

"There are definitely some challenges there that we have to work through," he said. "We will hopefully have some measures on that before too long."