NEW YORK – The Association for Molecular Pathology has published new recommendations for the use of computer-simulated (in silico) data in validating next-generation sequencing data analysis pipelines.
Labs may use in silico data to supplement analytical validation of NGS bioinformatics pipelines, particularly to assess analytical sensitivity or false negative rates for specific variants, the guidelines say. However, they should not supplant the use of patient specimens or other physical samples.
"As more laboratories around the country use in silico data to simulate variants to help validate the performance of clinical NGS data analysis pipelines, clinical laboratory professionals may need an aid for understanding both the value these methods bring and the important nuances and limitations of these approaches," Justin Zook, co-chair of the AMP In Silico Pipeline Validation Working Group and co-leader of the Biomarker and Genomic Sciences Group at the National Institute of Standards and Technology, said in a statement.
The consensus recommendations were issued in conjunction with the Association for Pathology Informatics and the College of American Pathologists (CAP) and published last week in the Journal of Molecular Diagnostics.
The paper "provides useful recommendations to help clinical laboratory professionals select the most appropriate format for their specific purpose," Zook added.
The recommendations come as some healthcare payors are pushing for validation of NGS-based tests with in silico methods, especially in oncology, while some labs are pushing back against such demands. The argument in favor is that such methods can be useful in assessing analytical validity of tests because they can query a greater diversity of variant types than might be found in physical samples.
CAP has used in silico challenges in proficiency testing; however, the specific variants used in the SPOTDx pilot program hosted by Tapestry Networks, and their applicability to real-world testing scenarios, have sparked controversy.
The guideline paper presents survey data on current usage and potential future applications. Out of 61 respondents, about a third were already using in silico data, while only 18 percent had no plans to use them at the time. Approximately 28 percent of survey participants reported making changes to their pipelines based on the results of in silico work.
The In Silico Pipeline Validation Working Group also led a literature review and analysis of the different types of data and how they're used in clinical molecular diagnostic laboratories.
In silico datasets can include purely simulated data or manipulated data from real samples. These datasets may create " a range of variants that may be difficult to obtain from a single physical sample," the authors wrote in the paper. "Such data allows laboratories to more accurately test the performance of clinical bioinformatics pipelines without sequencing additional cases. For example, clinical laboratories may use in silico data to simulate low variant allele fraction (VAF) variants to test the analytical sensitivity of variant calling software or simulate a range of insertion/deletion sizes to determine the performance of indel calling software."
Additional recommendations include suggestions that labs understand the limitations of in silico data, especially for "assessing pipeline performance in particular genome contexts and variant types susceptible to systematic sequencing and mapping errors," AMP said in a statement. Labs may consider using such data for minor updates to clinical bioinformatics pipelines. In addition, commercial vendors should include options to make it easier to import and analyze in silico data.
The authors further suggested that in silico data could be useful for validating copy number variant detection, as they're less common than single-nucleotide variants. They also identified opportunities for in silico data to help with gene fusion assessment, as well as with the analysis of RNA sequencing data, clonality, tumor mutational burden testing, and other sequencing methods.