Last July, the National Institute of Allergy and Infectious Diseases earmarked around $88 million to create eight national bioinformatics resource centers, or BRCs, to support research in biodefense and infectious disease.
So far, it appears that the NIAID's investment will likely pay off in the form of an interoperable nationwide network of bioinformatics tools and data. Over the last year, all eight centers have established the basic hardware and software infrastructure required to establish a "presence on the Internet," Valentina Di Francesco, bioinformatics program director in NIAID's Division of Microbiology and Infectious Diseases, told BioInform. She added that project organizers are now setting goals for the next phase of the program, with an eye toward expanding the content available through each of the BRCs as well as improved interoperability across them.
As a first step toward advancing the latter objective, NIAID last week launched a web portal, called BRC-Central (http://www.brc-central.org/), which serves as a hub through which researchers will be able to access all eight BRC websites. BRC-Central "is not supposed to replace the analysis and the detailed data that you can get from the [individual] websites," Di Francesco said, "but it's really more of a tool that will allow users to facilitate finding things that they may have a query about."
The portal is the result of the BRC Interoperability Working Group, comprising two members from each of the eight BRCs. In addition to the portal, the working group has also agreed on the GFF3 file format for exchanging sequence data, which all eight BRCs have adopted. Di Francesco said that longer-term goals for BRC-Central include a SourceForge-like repository for tools and materials from the centers, as well as improved methods for querying across all the BRCs.
"These things have been multiplying like rabbits, and every time I look at the number it goes up."
Di Francesco said that project officials are also in the process of determining additional data that will augment the genomic information that the BRCs already provide. Pathway and subsystem information and proteomics, microarray, and epidemiological data are all under consideration, she said.
The centers are currently identifying data that is already in the public domain that can be integrated with existing information, Di Francesco said, noting that this process will also require the BRCs to determine what types of information the pathogen research community requires — a process that isn't as easy as it appears, due to the diversity of the pathogens under study.
"To be honest, even if one center supports two or three different pathogens, the communities for those two or three different pathogens may need different things," Di Francesco said.
The BRCs were created to support NIAID's "high-priority" pathogens — a wide range of organisms including bacteria, viruses, parasites, and even invertebrate carriers of human pathogens, such as the mosquito — so the NIAID initially gave the BRCs a great deal of leeway in terms of the architecture and presentation of their online resources, Di Francesco said.
"It didn't make a lot of sense to force [the centers] to adopt any kind of look and feel because they all needed to focus on data types and information sources that were different," she said. "The initiative has been built to take into account the variety of the organisms that the BRCs will have to deal with." (See table below for a complete list of organisms supported by the NIAID BRCs.)
As a simple example, she noted, bacterial genomes don't have exons and introns, so the BRCs that focus on bacteria have different technical requirements from those that focus on viruses and parasites. Meanwhile, the VectorBase resource, which provides genomic information about vectors of diseases, must support much larger genomes than the other BRCs.
In addition, some of the BRC host institutions, such as TIGR and the University of Chicago, already had bioinformatics systems in place for managing large amounts of microbial genomic data, so these centers had a bit of a technical leg-up on their peers.
Di Francesco noted that the general goals of the BRC initiative are "very, very similar" to those of most model organism databases, and the technical challenges are more or less the same as well, with a few notable exceptions. First of all, she said, "model organisms typically have a wealth of information available."
In addition, she said, model organism databases "have large communities of people who work on them, while in many of the pathogens that we're working on, sometimes that's not the case." For example, she said, Bacillus anthracis "has a lot of people working on it, but organisms like Francisella tularensis have "a much smaller community."
Furthermore, the BRCs will have to cope with many more genomes than most model organism databases. "We're talking about possibly having to deal with metagenomics data, we're talking about having to deal with tens to hundreds of strains of microbial genomes that will be coming our way," Di Francesco said, "and that really [will require us] to create processes and tools and infrastructures that allow us to deal with that amount of data."
John Greene, director of bioinformatics at SRA International and principal investigator for the Enteropathogen Resource Integration Center BRC, admitted that he and his colleagues "are a little bit worried about that. These things have been multiplying like rabbits, and every time I look at the number it goes up." Greene said that the ERIC BRC is awaiting "a minimum of 53 genomes that are either in or on the way, and that is something we are a little concerned about how we're going to handle."
While these will be for the most part closely related strains, which would enable annotation via orthology, "it's going to be very difficult to annotate in detail 53 complete genomes," he said.
ERIC's approach has been to focus "a bit earlier than we had planned" on visualization tools that will help people distinguish between strains. One tool that will be made available through the ERIC site is a comparative genomics software package called Mauve developed at the University of Wisconsin, which can "compare multiple genomes simultaneously and it handles genomes rearrangements very well," Greene said.
Greene also echoed Di Francesco's comments that community feedback will be a crucial aspect for future BRC development, and that "outreach and education" will likely play a larger role in future stages of the project.
"Overall it's an interesting program — it's still getting rolling, I think it's still getting its feet fully under it, but it's going very well," Greene said. "I think in the next year or two, we'll really start to see how effective this is going to be.
Di Francesco said that NIAID's long-term goal for the BRCs is improved target identification and validation for vaccines and therapeutic development, which a few of the centers "are really starting to focus" on. "To me, this is probably the most important thing that the BRCs can do for this institute," she said. "That's what I hope to be able to show at the end of these five years."
— Bernadette Toner ([email protected])
The National Institute of Allergy and Infectious Disease's Eight BRC Websites
|ApiDB (Apicomplexan Database)||University of Pennsylvania; University of Georgia||http://apidb.org/||Toxoplasma gondii, Cryptosporidium parvum, Plasmodium phylum||Release 1.0 available|
|BioHealthBase||Northrop Grumman; University of Texas Southwestern Medical Center; Vecna Technologies; Amar International||http://www.biohealthbase.org/||Francisella tularensis, Giardia lamblia, Microsporidia, Ricinus communis, Multi-drug resistant Mycobacterium tuberculosis, Influenza virus||Version 1.0 to launch Jan. 31, 2006|
|ERIC (Enteropathogen Resource Integration Center)||SRA International; University of Wisconsin Madison||http://www.ericbrc.org/eric/||Enterobacteriaceae including: Diarrheagenic E. coli, Shigella, Salmonella, Yersinia enterocolitica, Yersinia pestis||Version 1.0 in beta testing|
|NMPDR (National Microbial Pathogen Data Resource Center)||University of Chicago; Fellowship for Interpretation of Genomes; University of Illinois Urbana-Champaign||http://www.nmpdr.org/||Staphylococcus aureus, pathogenic vibrios, Listeria monocytogenes, Campylobacter jejuni, Streptococcus pyogenes, Streptococcus pneumoniae||Version 2 launched Sept. 9|
|Pathema||The Institute for Genomic Research||http://pathema.tigr.org/||Bacillus anthracis, Clostridium botulinum, Burkholderia mallei, Burkholderia pseudomallei, Clostridium perfringens, Entamoeba histolytica||Version 1.0 launched July 11|
|PATRIC (PathoSystems Resource Integration Center)||Virginia Bioinformatics Institute; Loyola University Medical Center; Social and Scientific Systems; University of Maryland||https://patric.vbi.vt.edu/||Brucella, Coxiella burnetii, Rickettsia, Caliciviruses, Coronaviruses, hepatitis A, hepatitis E, rabies||Updated annotation of B. melitensis released Oct. 6.|
|VBRC (Viral Bioinformatics Resource Center)||University of Alabama Birmingham; University of Victoria, Canada||http://www.biovirus.org/||Variola major virus, Arenavirus, Hanta virus, Rift Valley fever virus, Ebola virus, Marburg virus, Dengue virus, California encephalitis group virus, Kyasanar forest disease virus, Omsk hemorrhagic fever virus, West Nile virus, Alphavirus, Hantaan virus, Puumala virus, Crimean-Congo hemorrhagic fever virus, Yellow fever virus, Tick-borne encephalitis, Nipah virus, Equine morbillivirus||Website launched April 11.|
|VectorBase||University of Notre Dame; European Bioinformatics Institute; European Molecular Biology Laboratory; Institute of Molecular Biology and Biotechnology; Harvard University; Purdue University; University of California Riverside||http://www.vectorbase.org/||Invertebrate vectors of human pathogens: Anopheles gambiae, Aedes aegypti, An. funestus, Culex pipiens, Ixodes scapularis||A. aegypti genome released Oct. 19|