BOSTON, Oct. 3 - Six months into a comprehensive three-year effort to map out the polymorphisms that differentiate the various strains of bacillus anthracis, TIGR investigator Steven Salzberg said his group had identified thousands of new SNPs that may help investigators pinpoint the source of additional anthrax attacks.
Having earlier this summer completed an analysis of the anthrax strain used in last fall's terror attacks--which identified about 60 SNPs and other genomic markers that distinguished the strain that killed five people in the
The project, led by TIGR investigator Timothy Read, aims to sequence the genomes of at least 14 strains of the microbe over the next two and a half years. TIGR has completed the sequencing of two additional strains--called the Kruger and
Although the Affymetrix-style chip for performing SNP assays would not be ready in its final version until the end of the three-year project, Salzberg told GenomeWeb that TIGR plans to create a preliminary version about half-way through the anthrax sequencing effort. To verify that the SNPs contained in the various anthrax strains are genuine markers, TIGR must first perform multiple PCR assays and resequence additional isolates from the strain's genome.
More generally, Salzberg said the initial project to compare the anthrax strains used in the terror attack to reference strains presented significant challenges for TIGR bioinformaticists because at the time TIGR did not have a value for the accuracy of its sequencing procedure. For more academic investigations that allow greater time to find and correct sequence errors, finding the probability of sequencing errors was less vital. "We had no good data from completed genomes," Salzberg said. "The sequencing error was never quantified."
Because TIGR had to make about 5 million pair-wise comparisons in its analysis of the strain used in the terror attacks, Salzberg said it was critical that TIGR, in order for its results to carry significance, prove that its sequencing error was less than one in 5 million. Ultimately, TIGR calculated that on average it made a sequencing error in one out of every 90,000 base pairs, but the researchers only reported SNPs obtained from "sequence regions of high confidence," Salzberg said.
"For forensic analysis, you need a confidence value for every base pair," he added, "and we need to genotype every [potentially toxic microbe] before an outbreak occurs."