With the threat of bioterrorism on the rise since the Sept. 11 attacks in the US and the ensuing anthrax scare, the biological research community has been faced with some serious questions regarding the release of potentially dangerous information. These concerns were formally addressed two weeks ago at the annual meeting of the American Academy for the Advancement of Science in Denver, where 32 journal authors and scientists released a collective statement urging caution when security concerns outweigh the scientific benefits of publication.
While the statement did not directly address the release of potentially dangerous information via bioinformatics databases, the close ties between these repositories and journals raises the question of whether tighter controls on some bioinformatics resources is an inevitable next step.
“In terms of bioinformatics, the statement doesn’t really have a big effect except for the issue of data release,” said Steven Salzberg, senior director of bioinformatics at the Institute for Genomic Research. Salzberg was the only bioinformaticist to sign the statement, which grew out of a workshop Jan. 9 at the National Academy of Sciences.
Salzberg told BioInform that the customary practice for large genome centers — to release their sequence data before publication — is not likely to change in the face of increased caution, “but we recognize … that there might be specific projects for which the data should be held back.”
TIGR, which has a heavy microbial genomics focus, began grappling with this issue well before the fall of 2001, “but now we’re going to be a lot more careful,” Salzberg said. In the case of the institute’s comparison of two strains of Bacillus anthracis, published in Science last May, “we discussed whether our data release policy should be changed, but decided it wouldn’t really provide any particular benefit to hold the data back.” Salzberg added that the published paper would have easily complied with the recent guidelines laid out by the scientific journals. “There were no new genes in the attack strain,” he said. “It just had a few genetic markers that could be useful for forensic purposes and genotyping.”
The issue of novel genes may be one of the key determinants of whether data on potentially dangerous organisms is made broadly available. “Suppose if we sequenced some pathogen and we found that there was some particular gene that was unique to a particular strain that made it especially virulent. We might want to think very hard about whether we release that,” Salzberg said.
The Risk of Novelty
While TIGR hasn’t been faced with this scenario yet, those data providers who have been are taking a cautious approach. Mark Borodovsky, a bioinformaticist at Georgia Tech, opted to place his ViralGeneDB database under password-protected access following the 2001 anthrax attacks. The database contains viral genomes annotated with Borodovsky’s GeneMark gene prediction program, and the novelty of the genes spurred his decision to keep a closer eye on who had access to the information.
“The knowledge about genes may be the most important knowledge you have about viruses,” Borodovsky said. “We realized that if we have something that is new knowledge — and this knowledge is absolutely open, with no protection, to the whole world — that it’s a possibility that someone who is involved in high-tech bioterrorism may come and look into specific viruses that are very dangerous, and look into specific genes, which are perhaps the most dangerous things, and maybe even use [this information about] genes to introduce them into well-distributed viruses. This might be a lethal weapon in a sense.”
Borodovsky voluntarily put a system in place to grant access to potential users on a case-by-case basis. Researchers e-mail Borodovsky directly, “and if they are a faculty member in an established academic institution or an established company, we offer open access without any additional questions. But if it is some student from a problematic country or somebody who doesn’t identify himself, we don’t provide access.” Borodovsky noted that he has turned down a number of requests using this screening process. Once granted access, users can log on with a password, so the hundreds of researchers who use the database do not suffer any inconvenience beyond the initial request.
But while this approach may work for a database with hundreds of users, complications arise for larger-scale resources. “Just the administrative costs of maintaining controlled access is fairly high,” said Elliot Lefkowitz, a bioinformaticist at the University of Alabama, Birmingham, who maintains the Poxvirus Bioinformatics Resource (PBR). An additional hurdle to stricter controls, Lefkowitz said, “is who is going to review the applications, and what criteria are they going to use to review it?” Finally, he added, it’s very easy to falsify information via e-mail and time-consuming to track down potential imposters.
The PBR group has had “ongoing discussions for years as to whether all this should be out there in the public or whether it should be censored,” Lefkowitz said. Currently, the PBR reviews each piece of information it releases on a case-by-case basis, “but I think we’re always going to err on the side of openness in publication unless there is a strong, specific concern,” Lefkowitz said.
Salzberg said that the journal editors and scientists who crafted the recent policy statement considered the options of password protection and other controls, “but no one had an idea of how you could make that work. Most of the pathogens are already in GenBank, which means that countless thousands of people already have the data, so the genie is out of the bottle. You can’t put it back in.”
Another difficulty, Salzberg said, stems from the essential nature of bioinformatics data: It isn’t an end in itself, but is provided for biologists and other researchers to create follow-on products such as diagnostics, vaccines, and antibiotics. “If we don’t release it, we’d have to come up with some other way of getting the information to the people who are developing treatments, but that’s a large community and we don’t even know who they all are,” he said.
Funding Agencies to the Rescue?
The question of whether controls are placed on potentially dangerous biological data may ultimately be decided by the funding agencies. Salzberg said that sequencers generally propose a data release policy along with their grant request, “so if the funders want to modify that, they let us know, and we usually do modify it according to their wishes if we really want to do the project.”
The problem, however, is that different funding agencies mandate different levels of data access. “Some funding agencies want the data to be public, others don’t care, and some don’t want it to be made public, so that makes it pretty complicated,” said Scott White, a technical staff member at the Center for Human Genome Studies at Los Alamos National Laboratory who is studying microbial pathogens.
“We’ve been trying to get this resolved since the middle of last summer,” White said. “We have some guidelines, but no clear requirements. We have to live by what the sponsors tell us.”