Bioinformatics may have fallen out of favor with the investment community, but it’s providing a big payoff for the United States’ biodefense infrastructure. For the past three years, a small computational genomics group at Lawrence Livermore National Laboratory has played a key role in the development of pathogen detection technology for biosecurity applications, according to Tom Slezak, who heads up the biodefense informatics team at LLNL and spoke publicly about the group’s work for the first time last week.
In a talk at the TIGR/Jackson Laboratory Computational Genomics conference in Cambridge, Mass., Oct. 9, Slezak said the 11-person bioinformatics group at LLNL is responsible for computationally identifying DNA and protein signatures that can be used for RT-PCR pathogen detection tools. These systems are used to monitor the environment for the early detection of airborne bioterror agents in several large cities across the country.
A 25-year computational biology veteran, Slezak led the informatics efforts for the Joint Genome Institute’s work on the human genome project until 2000. Then, when the Department of Energy committed to providing biosecurity services for the 2002 Winter Olympics in Salt Lake City, LLNL called on Slezak to use informatics to solve a hairy problem: At the time, Slezak said, DOE biologists assumed it would be fairly easy to secure primers and probes from other labs that had already developed assays for a number of dangerous pathogens. But it turns out that most of the information was either classified or “owned” by academic researchers who refused to share their data, he said. With only two years to develop PCR diagnostics for a laundry list of pathogens — hardly enough time to start from scratch in the wet lab —Slezak had to develop a computational work-around.
Slezak’s team met their dealine, and has successfully identified signatures for “all major pathogens for which whole genome sequence data is available” using what he claimed to be the world’s only fully automated DNA signature pipeline. Called K-Path, the system can process the entire genome of a typical microbe in under two hours, generating a short list of possible signatures that must then be validated in the wet lab and then the field. Backed by a 24-CPU Sun compute server and a 6-CPU Sun database server, K-Path knits together a series of algorithms that first identify the conserved regions of the genome and confirm which conserved regions are unique to that pathogen, and then winnow the list down to a set of signature candidates.
Slezak said that his team did write some of its own software for K-Path, but the heart of the system is built around two algorithms developed by Stefan Kurtz at the University of Hamburg in Germany. One, MGA (multiple genome aligner), was the first algorithm available for aligning multiple whole genomes, Slezak said. LLNL uses MGA along with several other multiple alignment algorithms to identify conserved regions for each new pathogenic genome. Then, Slezak said, K-Path uses Kurtz’s Vmatch algorithm to compare those “consensus” conserved regions against all other available microbial sequences to pick out the regions that are unique to that particular pathogen. One trick the LLNL team relies on is to use unique intergenic regions rather than coding regions as signatures, because these segments of the genome are “the least likely for terrorists to stumble upon and engineer around,” Slezak said.
While K-Path has automated its signature selection pipeline, the process still presents formidable challenges, Slezak said. The issue of sensitivity vs. specificity is crucial, for example, since it’s important to detect very low doses of a pathogen, but the alarm bells can’t go off every time “some farmer has a single spore of anthrax on his shoe,” he said. The goal is to find regions that are conserved in all virulent strains of a pathogen, while being unique to all other organisms. This can be difficult for pathogens with many, many strains, most of which have not been sequenced yet, with near-neighbors that may not be virulent.
In addition, Slezak said, RNA viruses such as Ebola have very high mutation rates, which makes it difficult to find adequate regions of conservation. Microbial genomes that are sequenced by “boutique” labs also pose a problem, because the quality of that sequence isn’t always very good. In those cases, “you can’t tell the difference between bad sequencing and natural variation,” he said. In addition, algorithms for aligning draft genomes or fragments against complete genomes and against other draft genomes are still lacking. David Hysom, an LLNL computer scientist, is working on that challenge, Slezak said.
Finally, Slezak noted, plain-old human politics can throw up additional barriers when “turf wars” break out over disease programs. In California, for example, there are four agencies responsible for studying West Nile virus: one for humans, one for birds, one for horses, and one for all other animals. Getting everyone to cooperate is often the hardest part of the job, he said.
When the anthrax attacks occurred in the US in October 2001, the LLNL group rushed its prototype pathogen detection system into live use on the East Coast within two days, Slezak said. By early 2003, multiple pathogen assays developed by LLNL and validated by the Centers for Disease Control and Prevention were put into daily use in several cities across the country as part of the Department of Homeland Security’s BioWatch program. One autonomous pathogen detection unit is placed smack in the middle of Times Square in New York, and looks like a discarded refrigerator. Slezak said that although the New York Police Department feared the system would be vandalized, dismantled, or shot at, the worst thing to befall the system is a scribbled note on the side asking, “What is it?”
Slezak said that the LLNL team is currently working on a much smaller version of the system it calls the “bio-briefcase.” Ultimately, the goal is to create a “bio-smoke alarm” for widespread use.
Most recently, the LLNL group provided its signatures for the SARS virus to the US Army Medical Research Institute of Infectious Diseases for testing. While unable to disclose details of other pathogens his group is studying, Slezak was able to sum up its mission succinctly: “If it affects humans, agriculture, or Congress, we need to know about it.”