The Bill & Melinda Gates Foundation has awarded Stanford University a four-year, $7.5 million grant to create a centralized genomics database to support tuberculosis drug- and vaccine-development.
The “driving force” for the award, according to project head Gary Schoolnik, is that three other TB research programs funded by the Gates Foundation require access to a comprehensive database on Mycobacterium tuberculosis: one project is developing new drugs to combat latent TB; another is developing vaccines; and the third is identifying biomarkers that are predictive of drug or vaccine efficacy.
As a result, Schoolnik said, the project plans to release a “fast track” version of the database by October, with more complete versions released over the next three years of the grant.
Schoolnik, a professor of medicine, microbiology, and immunology, is leading a team of researchers at Stanford, the Broad Institute, and the Harvard School of Public Health to compile and integrate a broad spectrum of genomic, proteomic, and structural data related to M. tuberculosis.
The primary goal — and challenge — for the effort will be integrating a vast array of disparate resources that already exist for the organism, Schoolnik said. While the M. tuberculosis genome was sequenced in 1998, and the annotated sequence is available via the TubercuList database, Schoolnik noted that there is currently no centralized resource that integrates this sequence data with other information that would be helpful to researchers looking to combat TB.
Schoolnik outlined a broad range of data that will eventually be integrated into the resource. His lab, for example, has already accumulated a large amount of TB gene-expression data that will serve as a solid foundation for the database. In addition, the Stanford team plans to conduct whole-genome RT-PCR-based gene-expression studies on tissue samples from TB-infected patients.
“It’s very difficult to obtain expression information from bacteria within host tissues [using microarrays] because the host tissues have so much more RNA than the microbe contributes,” Schoolnik said, “so you need an ultra-sensitive way to do that, and RT-PCR is the preferred way to do that.”
The Stanford researchers plan to build upon the existing Stanford Microarray Database infrastructure to house the TB microarray data, but integrating that with the RT-PCR experiments will require a bit of work because the RT-PCR data “doesn’t look a bit like the data from a microarray experiment,” Schoolnik said.
“Exactly how these two databases will be melded together so that one can navigate seamlessly between these two datasets is one of the challenges in building this database,” he said.
In addition to the gene-expression data, Schoolnik said the database will include “all the known annotated sequences of Mycobacterium tuberculosis plus related organisms.” The Broad Institute, which is currently sequencing and annotating eight strains of the pathogen, will contribute this component of the resource.
The database will also be integrated with predicted metabolic pathways from SRI International, which currently maintains pathway databases for M. tuberculosis CDC1551 and M. tuberculosis H37Rv as part of its BioCyc suite.
“Exactly how these two databases will be melded together so that one can navigate seamlessly between these two datasets is one of the challenges in building this database.”
Other components include a database of gene knockouts and associated phenotypes, a database of essential genes, structural information for M. tuberculosis proteins, a database of antigenic M. tuberculosis proteins, and a set of protein-protein interaction data from yeast-two-hybrid experiments.
Schoolnik said that some of this data already exists and some has yet to be generated. In addition, the Stanford team is still in discussions with external resources, such as SRI and TubercuList, regarding integration details. “We’re not sure at this point whether we’ll have links out or whether we’ll bring them in,” he said.
The goal, he said, is to create a resource that will allow researchers to navigate easily from a gene of interest to a metabolic pathway, expression pattern, protein structure, or other key information that will help provide insight into drug or vaccine development.
He added that a longer-term goal for the project is to add “druggability scores” to the proteins in the database that will enable researchers to predict whether a particular protein action can be inhibited. “That’s something that we’re keen to do, although exactly how is not clear right now,” he said.
The primary design goal for the database is to support “certain goals in human medicine,” Schoolnik said. “As far as we know, there is no other database like this for small genomes.”