NEW YORK – A group of 11 laboratories from the international Cancer Moonshot consortium has completed an effort to standardize data-independent acquisition mass spectrometry workflows at sites around the world.
Described in a study published last week in Nature Communications, the project aimed to show that DIA mass spec running non-stop for extended periods of time could generate reproducible proteomic data across multiple sites and demonstrate the suitability of the approach for distributed, large-scale clinical studies.
Interest in this kind of capability is growing as the proteomics field increasingly looks to play a role in the sort of population-level studies genomics has been tackling for some time, said Thomas Conrads, director of women's health research at Inova Fairfax Hospital and senior author on the study.
DIA's reproducibility and relatively high throughput have made it the technology of choice for many looking to do proteomics in large clinical cohorts. Researchers at centers like the Australian Cancer Research Foundation International Centre for the Proteome of Cancer (ProCan) and the University of Manchester's Stoller Biomarker Discovery Centre have in recent years used the approach to analyze tens of thousands of clinical samples.
However, as Conrads noted, this sort of large-scale DIA work has largely been concentrated in a relatively small number of expert facilities.
"If we are going to move towards population proteomics, we are going to have to be able to do distributed analyses," he said, adding that getting deep proteomic analysis on a significant number of diseases would require more mass spec resources than even the largest proteomic centers can provide.
He cited the trajectory of genomics research over the past two decades as an example of where he believed proteomics needed to go, noting that while in the early days of the technology major genomics projects were primarily done at large centers specializing in the field, "now we appreciate that genomics has enjoyed quite a bit of harmonization and stabilization, and you have the ability to run your, say, Illumina next-gen sequencer almost identically whether you are at the Broad [Institute] or at a small Midwestern university."
In the Nature Communications study, Conrads and his co-authors developed a quality control system for monitoring the performance of DIA mass spec experiments and applied it to a high-throughput DIA workflow run continuously for seven days at 11 sites around the world analyzing a standard made of Escherichia coli, yeast, and human cell line proteins and clinical samples of well-characterized ovarian cancer tissue.
The researchers used a 60-minute capillary flow LC gradient and Thermo Fisher Scientific Q Exactive HF instruments. To establish the standards for system performance, they ran at four reference labs a QC standard consisting of a peptide digest from a HeLa cell line continuously for several days on LC-MS platforms operating at different levels of performance. They used that data to establish what levels of system performance would produce acceptable data, which allowed them to set parameters for acceptable system performance.
The 11 participating labs then analyzed the test samples, running their systems continuously for seven days and running a QC standard each day to establish that their systems were performing at acceptable levels.
Of the 11 labs, nine met the QC requirements throughout the course of the study. Of the other two labs, one suffered from poor chromatography throughout the study. The other lab, Conrads' facility at Inova, discovered a QC issue in the middle of the study that it found was attributable to a mass spec collision cell that required maintenance.
"Because of the standards and the QA/QC metrics… we were able to recognize that early on and stop production and address the problem and then ramp back up to get back to the benchmark," he said.
As hoped, the researchers found that their setup allowed for a high level of reproducibility both within labs and across laboratories. Within labs, more than 80 percent of the total proteins quantified from the protein standard mix were quantified on all days the sample was run. Looking across labs, roughly 80 percent (5,784) of the proteins were quantified across all of the participating labs, while 4,565 proteins were quantified by every lab on every day of the experiment.
Three labs took part in the analysis of the ovarian cancer sample tissues. The analyses identified a total of 5,712 proteins from these samples, 3,808 of which were quantified in individual patient samples across the labs.
Conrads said that moving forward he and his collaborators hoped to do a more extensive project looking at clinical samples.
"What we want to do now is see if we can assemble a cohort of ovarian cancer patients or what-have-you and do it in a way that the samples are processed centrally then randomized and distributed to demonstrated that this can be done in a [clinical] cohort," he said, adding that he and Bernd Wollscheid, professor of health sciences and technology at ETH Zurich and an author on the Nature Communications paper, are currently working on plans for such a project.
The goal, Conrads said, is to progress DIA workflows and QA/QC processes to the point where a relatively inexpert lab like his is able to produce DIA data of the same quality as an expert lab like Wollscheid's.
In another development on the QA/QC front, researchers from the Institute for Systems Biology last week published a paper, also in Nature Communications, detailing a tool for evaluating the quality of the spectral libraries required for DIA experiments. The tool, called DIALib-QC, addresses an issue that has largely flown under the radar, said ISB research Robert Moritz, senior author on the study.
Beyond quality control efforts like the pair of Nature Communications studies, advances in DIA technology, particularly on the software side, are allowing for more widespread use of the technique. Most notably, the development of deep-learning tools that can be used to predict spectral libraries have saved researchers the time and effort needed to build them themselves and significantly streamlined such workflows. This has made it easier for facilities like core labs to quickly and easily generate good DIA data.
Conrads suggested that while DIA technology will continue to improve, the Moonshot study indicates that it can be used effectively for large-scale, distributed proteomics projects right now.
"Is it prime time [for DIA]? I think it is," he said.