Microarray technology is a great tool for identifying targets. In fact, it’s too good — discovery scientists are currently inundated with thousands of promising differentially expressed genes about which they have little functional understanding.
Faced with this problem, the genomic and information sciences team at Hoffmann-La Roche built an integrated lab system that helps reduce the vast number of potential hits from microarray experiments down to a manageable number for validation. Holly Hilton, a Hoffmann-La Roche scientist, explained the approach at the Marcus Evans Data Analysis and Visualization meeting held in New York, December 3-5, 2002. “How do we efficiently and effectively integrate genomics into the development process?” Hilton asked. Hoffmann-La Roche’s answer, she continued, was to turn to “process biology” — a modular system that breaks each stage of the research process into a component. Each module can be run as a standalone analysis step as well as link to upstream and downstream processes via shared data resources.
The bioinformatics, genetics, and genomics groups at Hoffmann-La Roche began reorganizing their workflows in line with the process biology template at the beginning of 2002, Hilton said. Currently, a microarray module and a STEP (single target expression profiling) module are fully functional. While the microarray module is used to profile thousands of genes within a single tissue type, the STEP module provides information on hundreds of tissues and donors for each gene using quantitative RT-PCR to generate gene expression profiles.
The foundation for both modules lies in a gene index and a tissue index — global resources that contain data on all the genes and tissues under study at the company. The tissue index is a directory of over 3,000 tissues stored in roomful of freezers Hoffmann-La Roche has dubbed its “biobank.” The index uses a controlled vocabulary to standardize tissue information from pathology reports, demographics, and other clinical data.
The microarray module begins with the tissue index. Tissues are chosen for microarray experiments based on gender, age, disease type, or another classifier. A team of statisticians analyzes the raw data directly from Affymetrix chips using GeneSpring and SAS, and then deposits the results in a GeNet database, which is linked to the gene index via a web interface. Biologists view the gene expression data in GeNet, and can resubmit data to the statisticians for review if necessary.
Linking the annotated tissue information with the microarray results has proved useful for Hoffmann-La Roche, Hilton said. In one example, she noted, an upregulated gene appeared to be linked with diabetes, but it turned out to be linked to age instead, and was therefore eliminated from further study. Because age and diabetes are often correlated, the researchers might not have been able to rule out the gene in question if the demographic information from the tissue index had not been at hand, Hilton said.
The STEP module is then used to further validate the microarray data. According to Hilton, this step can eliminate up to 50 percent of candidate genes that come off the Affy platform. Researchers can select which genes they want run on the STEP platform via a web-based interface complete with a shopping cart-like “gene cart.”
Tissues from the biobank are selected and quantitative RT-PCR is run on 96- or 384-well plates. The resulting data is presented as gene expression levels, graphed for each tissue. When users click on a tissue type, they are linked directly to the tissue index for further information. STEP results are also added to each gene in the gene index, which also contains relevant sequence data, reagent information, and a summary of microarray experiment results.
“But is this complexity of integration and automation really necessary?” Hilton asked before any of the conference attendees could. “No,” she said, “if you’re only performing a handful of experiments.” However, the process biology system Hoffmann-La Roche has developed has turned out to be an effective way to feed data from high-throughput genomics processes into the lower-throughput validation steps downstream.
Next on the Hoffmann-La Roche agenda is adding toxicogenomics data into the mix, Hilton said. The company is collaborating with the International Life Sciences Institute on a database of profile expression changes with the hope of finding predictive gene expression fingerprints to screen for toxicity in new compounds.