The National Center for Biotechnology Information isn't the likeliest government agency to be associated with the terrorist attacks of Sept. 11, 2001, but the branch of the National Library of Medicine best known for GenBank and PubMed contributed its computational know-how to help identify victims in the aftermath of the attacks.
NCBI is currently preparing to roll out a production version of the software it developed in response to 9/11 to help identify victims of Hurricane Katrina, NCBI's Steve Sherry told BioInform. Sherry spoke publicly about the center's forensics activities at the Computational Genomics conference held in Cambridge, Mass., in early November.
Sherry runs the dbSNP polymorphism database at NCBI and served on the 24-member Kinship and Data Analysis Panel (KADAP) that the National Institutes of Justice convened immediately after the World Trade Center attacks. KADAP was formed to assist the New York State Police and New York City Office of the Chief Medical Examiner to develop new protocols and analytical frameworks for DNA-based victim identification.
The panel published a paper in Science on Nov. 17 outlining the methodology it devised for 9/11 and its recommendations for improving DNA identification in the event of future mass disasters.
Among seven key recommendations for future DNA identification projects, two involve informatics: "Software must be able to integrate analytical, database, and workflow functions," the panel wrote in the Science paper, and "Information technology infrastructure must be adequate to interconnect data-gathering, analysis, archiving, and reporting functions."
In an initial run with an unidentified state forensics lab, the software identified eight mistakes in a single data set a disturbing finding, given that the lab in question claimed it had only made two errors in its entire history.
Sherry said that NCBI developed a software package called OSIRIS (Open Source Independent Review and Interpretation System) to address some of the informatics shortcomings highlighted by the KADAP panel. Specifically, OSIRIS was developed as a quality assurance software package that could validate the results of various commercial software products that were used in the WTC project.
At the time of the attacks, Sherry said, "no software existed" for quality assurance of forensic analysis, so there were no guarantees against experimental error, software glitches, interpretation mistakes, and inconsistencies in metadata. In addition, Sherry said, during the WTC identification, in which nearly 20,000 tissue fragments were analyzed to identify 2,749 victims, vendors were updating their identification so rapidly that there was a very high risk for bugs.
Because of the necessity to protect against identification errors, the NIJ and NCBI signed a formal agreement in 2003 to support the development and deployment of OSIRIS as a public-domain quality-assurance tool for DNA forensic analysis. Desired features for the system, which evaluates microsatellite-typing data for genotype accuracy, included rapid quality assessment, automatic allele-calling, an open source licensing model, and the use of a standard data model, Sherry said.
NCBI has been co-developing the system with Applied Biosystems and SAIC, an IT consulting firm. Initial development has been limited to ABI's line of 377, 310, 3100, and 3700 genetic analyzers running ABI's GeneScan and GenoTyper software packages because this was determined to be the most popular system used in forensic DNA analysis.
Sherry said that the system is still in beta testing, but NCBI is "getting ready to put it out for real use" particularly in the victim-identification effort for Katrina, which poses "many of the same issues" as 9/11. NCBI "expects" to use OSIRIS in the effort, Sherry said, but added that Mississippi and Louisiana are still hammering out the details of how they intend to pool their forensics resources, so the exact role of NCBI has yet to be determined.
In the case of the Katrina identifications, Sherry said, NCBI would use OSIRIS to "flag" results that appear to be problematic so that a human can take a second look at the results.
Recently, Sherry said, "there has been a tremendous focus on quality in the forensics community," so OSIRIS has been targeted to state crime labs and commercial DNA-typing services that are facing more pressure to guarantee that their analysis is accurate.
For example, in an initial run with an unidentified state forensics lab, the software identified eight mistakes in a single data set a disturbing finding, given that the lab in question claimed it had only made two errors in its entire history, Sherry said.
Sherry said that NCBI envisions OSIRIS results as one component of electronic "forensics profiles" for DNA testing records that will include information about how the data was collected, the context of the sample, the interpretation protocol used, and "concordance metrics." In addition, he said, it should be possible to capture the original electropherogram trace in a compressed file format so that forensics specialists with a question about the results won't have to request a fax from the lab. Currently, he said, forensics records include only the interpretation of the results, not the raw experimental data.
Currently, Sherry said, OSIRIS is equipped with more than 300 morphology rules to help identify experimental errors. The software can review 60,000 profiles in nine minutes with no false negatives and three false positives, he said. In a recent run of more than 59,000 human samples, 14 loci were flagged for review to reveal four actual errors.
Further information about the OSIRIS system and software downloads are available at http://www.ncbi.nlm.nih.gov/IEB/Research/GVWG/OSIRIS/index.htm.
Bernadette Toner ([email protected])