A team of undergraduate students participating in the annual International Genetically Engineered Machines competition is developing software to help gene synthesis companies and their customers detect the possible use of manufactured DNA as a bioterrorism agent.
The students, from Virginia Tech and ENSIMAG, an engineering school in Grenoble, France, are developing algorithms and assessment tools based on federal guidelines for synthetic genomics issued last year.
The voluntary guidelines, published in November by the Office of the Assistant Secretary of Preparedness and Response within the Department of Health and Human Services, recommended that producers of synthetic genomic products use computational methods to screen manufactured DNA for biosecurity threats, but didn't provide any details with regard to implementation.
For their project, the students are assessing the screening strategies outlined in the guidelines and developing an implementation of these methods. The implementation, called GenoTHREAT, is designed to assess how similar a specific DNA sequence is to entries in the Centers for Disease Control and Prevention's Select Agent and Toxin List. They are also compiling a database of test cases that will allow them to assess the performance of different screening strategies.
“They are building a small database of a few hundred sequences annotated with their expected screen outcome,” Jean Peccoud, an associate professor in charge of the project at Virginia Tech's Virginia Bioinformatics Institute, told BioInform. “This makes it possible to compare the program output with the expected output to make sure that the program is correct.”
The team plans to present the final results of its analysis at the iGEM synthetic biology competition in November and also to submit the work for publication in a peer-reviewed journal.
“Screening Framework Guidance for Synthetic Double-Stranded DNA Providers,” published last year by HHS, was intended to “provide guidance to producers of synthetic genomic products regarding the screening of orders so that these orders are filled in compliance with current US regulations and to encourage best practices in addressing potential biosecurity concerns.”
Peccoud said that his decision to investigate the bioinformatics aspect of the guidelines came out of a workshop he attended earlier this year organized by the American Association for the Advancement of Science on reducing the risks of synthetic DNA, where the guidelines were discussed.
“I left the workshop thinking there is this high-level guidance but really no indication on how it should be implemented,” he said. “The gene synthesis companies are pretty much left by themselves to try to figure out how to implement it.”
He continued. “Since I really didn’t have any funding to do [the implementation], the cheapest way was to get undergraduate students involved. We came up with the idea of doing it as a project for iGEM.”
The guidelines suggest two screening approaches to analyze synthetic DNA sequences. The first approach involves comparing a synthetic gene sequence to a curated reference database of known pathogenic sequences and other genomic data.
This approach "requires the creation of databases identifying specific features such as known pathogenic sequences, virulence factors, house-keeping genes, etc.," HHS noted. "While the acquisition of such knowledge is progressing, at this time customized database approaches are unable to provide a robust solution that can be implemented by DNA synthesis providers."
[ pagebreak ]
The second approach, which the guidelines suggest is a better option, uses a method called “Best Match” to flag synthetic sequences that are closely related to harmful sequences by identifying nucleic acids that are unique to those sequences.
In this approach, a query sequence is considered to be unique to a pathogenic organism if the sequence "is more closely related to a select agent or toxin sequence than to a non-select agent or toxin sequence," the guidelines state.
To determine the best match, the guidelines suggest that the sequences be broken into a “six-frame translation” of 200-bp nucleotide segments and the resulting fragments aligned to sequences in the GenBank protein sequence database using a sequence alignment tool such as Blast that utilizes both a global and local sequence alignment technique.
As part of implementing these approaches, Peccoud said that his team has developed keyword lists corresponding to different pathogens that help track down matches in the National Select Agent Registry.
“Blast returns GenBank accession numbers that need to be interpreted. Using these keywords, the GenBank records are retrieved,” he said. “We search the record for the presence of keywords representing the species of select agents. We also have anti-keywords to account for exceptions.”
In addition to evaluating the performance of the screening options outlined in the guidelines, Peccoud says that the team plans to find out whether there are alternate interpretations of the guidelines that might produce better results. They are also assessing the computational cost of screening DNA sequences.
“[We want to know] how many CPU hours it’s going to take to screen orders and [if] it is compatible with the operational constraints of the gene synthesis company,” he said. “That’s probably the biggest issue that may have been overlooked.”
For now the test suite of sequences is located in an independent database but Peccoud says that he is considering including these sequences in VBI's GenoCAD database to “evaluate our ability to detect them in a larger collection of DNA sequences.”
The GenoCAD genomic design software software, developed by Peccoud’s lab, is a tool to design synthetic DNA molecules from genomic libraries.
He said that when GenoTHREAT is complete, he plans to use it to monitor sequences that users upload to the GenoCAD database and to identify users requesting accounts in GenoCAD because of “the remote possibility that GenoCAD may be used to design biological weapons.”
The biosecurity aspects of the project have also raised questions around making GenoTHREAT available as open source software.
“Our first reaction was to try to make [GenoTHREAT] open source but if we make it too easy for people to figure out how DNA sequences are being screened, [then] there are some concerns about [whether] it [will] enable them to circumvent the bioscreening effort,” he said.
“We will seek the advice of law enforcement agencies before making any licensing decision for GenoTHREAT,” he added.
Although the project began with no financial backing, Peccoud said that the students’ work is now funded by both Mitre and Science Applications International Corporation, but he declined to disclose how much the project received from the groups.