Bioinformatics technology may benefit from the unlikely pair-up of the US intelligence community and the National Science Foundation.
The Intelligence Technology Innovation Center, a branch of the CIA, is adding $8 million over the next three years to existing NSF grants for research in data mining technology.
Spurred by the increased national security concerns surrounding last fall’s terrorist attacks, the ITIC is seeking new methods to detect patterns in huge data sets compiled from television broadcasts, web pages, e-mail, and other sources. The hope is that technology already being developed to extract knowledge from similarly large scientific data sets will expand the agency’s arsenal of tools.
All the research from the funded projects will remain as freely available as it would under any other NSF-supported grant, said Gary Strong, program officer in the NSF’s directorate for computer and information sciences and engineering. The grantees — 16 groups in all — will simply be applying the technology they developed for other scientific domains to new data sets provided by the ITIC.
The partnership falls under an interagency program called Knowledge Discovery and Dissemination, in which the NSF flags projects that might be related to national security for additional funding.
Strong said that several of the KDD grantees are already conducting biological informatics research under NSF ITR (Information Technology Research) grants, which are awarded for IT in support of a range of scientific fields.
For example, Fred Roberts of Rutgers University is currently funded under an ITR grant to develop computational models to analyze the spread and control of infectious diseases. This work in computational epidemiology landed him an additional KDD grant generically entitled, “Monitoring Message Streams: Retrospective and Prospective Event Detection.”
In addition to supplementing existing computational biology research, Strong said the project’s support of general-purpose data mining research should also pay off for bioinformatics — in the form of better future technology options. One area of interest for the KDD research, he noted, is data mining techniques that protect the privacy of information, “and in the biological world, particularly when you’re dealing with clinical information, there’s a huge need for privacy-preserving data mining,” he said.
Lessons learned from new advances in computational linguistics under the program could also be applied to bioinformatics, Strong said. Noting that the hidden Markov model — one of the most effective methods for gene prediction — was borrowed from the speech recognition research domain, he added, “People have decided that some of the important coding regions are more difficult to find with hidden Markov models and they’re looking for other language modeling tools that may be more useful.” Pattern recognition approaches derived from the study of human language grammars are also of interest to those developing RNA- and protein-folding prediction tools, he said.
While the KDD projects were hand-selected, Strong noted that there’s plenty of funding available for bioinformatics under the ITR program. The program awarded around $250 million in grants last year, and Strong estimated that around 10 percent of that went to bioinformatics projects. The proportion for bioinformatics is expected to increase in 2003, he said, even if the total amount of funding for the program doesn’t.
Although the program announcement for the FY 2003 program (available at: www.nsf.gov/pubs/2002/ nsf02168/nsf02168.htm) notes that the NSF expects to spend $145 million, “It’s expected by the foundation that that program will continue at the same level or slightly larger in the coming fiscal year,” he said, adding that all funding will depend on the final appropriation process by Congress.