There was just one problem at all the parties, press conferences, and publications celebrating the completion of the Human Genome Project (again and again and again) — and that problem was the sinking feeling in the pit of scientists’ stomachs as they realized that, even though they’d achieved their goal of sequencing the human genome, they weren’t necessarily all that much closer to understanding what any of it meant.
The hope, of course, was that comprehensive sequencing and careful annotation would reveal what those 100,000 genes were doing. (Oops.) But with the human, just as with other genomes that were beginning to take shape, it became evident that it was a small minority of genes that researchers could identify and describe functionally. Some three-quarters of genes were no more than a series of bases on a computer screen.
So along came functional genomics, or the science of elucidating function from all those mystery regions of the genome. If there’s any field that can unite scientists across organisms and across technologies, this is it. The quest for function is critical enough to bring together researchers from any number of backgrounds, combining their expertise to build better tools and explore function in new ways, from imaging to informatics to synthetic biology. Proteomics has taken a leading step, and gene expression continues to be important as this field evolves.
Four years ago, the US National Institutes of Health threw the field a bone by launching the multimillion-dollar ENCODE project — short for the Encyclopedia of DNA Elements — to study functional elements in the genome. The pilot project, which studied 1 percent of the genome, wrapped up this year and published its findings in Nature in June.
But ENCODE is just a small sliver of the functional genomics field. In this story, GT profiles a handful of leading scientists using different technologies and innovative approaches to the chore of assigning function to genes. The wide-ranging projects highlighted here serve to underscore just how broad this field is now — and promises to be in the future.
Pathogen Functional Genomics: Sequencing Just Wasn’t Enough
In a prescient move six years ago, the National Institute of Allergy and Infectious Diseases issued an RFP for a functional genomics facility that could serve as a central hub to provide microarrays and other resources for a small but growing community of scientists interested in these tools.
The Institute for Genomic Research won the contract, and in October 2001 the Pathogen Functional Genomics Resource Center opened its doors — just weeks before the beginning of the anthrax letters spree in the US. Since then, the center has grown and today (hosted by the J. Craig Venter Institute) its resources reach hundreds of scientists.
Co-directed by Scott Peterson and Robert Fleischmann, the center’s technology mainstays are pathogen-focused microarrays and clones. “We support on the order of 500 investigators with DNA microarrays,” says Fleischmann, and “close to 200 investigators with clones.” The arrays are homegrown — glass slides spotted with 70-mer oligos — and now that more and more strains are becoming available for each pathogen, the center’s scientists are reaching density limitations with their species-specific arrays, Peterson says. “We’re looking at having 20 or more genomes that have been sequenced, each of them contributing highly divergent or truly novel genes. That’s turning into fairly gargantuan arrays.” Fleischmann adds that the center is looking to team up with a major array manufacturer to help overcome this challenge. As for the clones, “we create a resource that allows [users] to take those ORFs and put them into whatever mode of operation they’re interested in,” says Peterson.
At the functional genomics center, serving the community is the name of the game. That means that technology platforms have to be suited to as broad a user group as possible — and that the directors have to stay a step ahead, choosing which instruments to invest in to keep up with researchers. Peterson says that after microarrays and clones, proteomics has proven to be of growing interest. That’s meant both classical proteomics as well as “very high-throughput protein expression and purification,” all of which have been added to the center’s repertoire.
In Peterson’s view, studying proteins might be just what researchers need to answer the questions that sequencing hasn’t been able to. A rule of thumb in the field, he says, is that generally 30 percent or so of an organism’s genes will be amenable to “reasonable annotation.” Early on, scientists thought that adding more sequencing would clear up the rest of those genes that weren’t so easy to annotate — but time and heavy doses of sequencing have proven otherwise. “Evolution has created independently similar functions in microbial organisms. We see this all the time: eukaryotic polymerases and prokaryotic polymerases evolved independently” though they carry on similar functions, Peterson says. “Genome sequencing … allows one to annotate genes with functions at an unprecedented rate, but there really is no shortcut for those genes that don’t share common ancestry.” Because of that, he sees tremendous promise in structural biology studies, where he hopes that tracking function by protein might reveal similarities that couldn’t be recognized through standard sequence analysis.
Recently, the center instituted a new white paper process designed to “as much as possible allow the scientific community to direct what it is that we do,” Peterson says. Investigators are encouraged to familiarize themselves with the center’s technology platforms and then submit mini-R01 proposals in which they make clear how their goals would also help the rest of the community. (“A proposal that comes in and serves the need of that particular lab alone will not fare very well,” Peterson says.) The center reviews proposals to see whether the technology requirements are a good fit for its capabilities, and separately, NIAID members review the proposals for a broader sense of relevance and usefulness for a range of users. The white paper process just went through its first round, and Fleischmann says, “We’re finding this to be very valuable.”
Rooted in Synthetic Bio, a Tunable Switch to Dial Genes Up and Down
When Jim Collins looks at the genome, he’s not seeing the complex biological system that most scientists do. Instead, Collins sees an engineering problem — and one that he can help solve.
In his latest contribution, Collins, a professor of biomedical engineering at Boston University, drew on his background in synthetic biology to figure out a better way to knock down genes. RNAi and other knockdown techniques are frequently used in functional genomics, with the simple idea that if you turn something off, maybe you can figure out what it was doing in the first place. But the problem with those approaches, as Collins saw it, was that they don’t achieve anything close to 100 percent shutoff of the target genes. “With RNAi you can’t get a very tight off,” he says. “You’ll still get maybe 10 percent expression, and that may be enough to yield the phenotype.”
So Collins and graduate student Tara Deans looked to see whether the situation could be improved. Collins had previously designed a toggle switch in bacteria that consisted of two genes; each one would turn the other gene off, and researchers could use the switch by assigning control to one of the genes, thereby getting “very tight repression” of the other gene. But getting the system to work in mammalian systems proved far more difficult. “Our initial efforts failed miserably,” Collins says. The problem: the repression proteins that had worked so well in bacteria were only about 85 percent effective in mammals.
Inspiration struck, and Collins realized that combining the toggle switch idea with RNAi could be just the right solution. The plan: “combine a repressor protein to shut off transcription, and then have an RNAi component built in so that if any transcript leaked, it would shut it down before” it could get out into the system, Collins says. It worked so well in mouse and human cells, giving “greater than 99 percent shutdown,” that Collins and his team saw the new switch could even be used to finely tune gene expression, instead of just turning a gene off. “You can tune it like a rheostat,” he says. You can also turn genes on and off over time. For instance, scientists studying mouse development could turn a gene off during a particular developmental state and then flip it back on later in the mouse’s life, Collins says.
To fully appreciate the “tight off” the switch provides, the Collins lab tested it out with DTA, a protein so effective that even a single molecule can kill cells. To Collins’ delight, his team was able to “use the switch to make stable cell lines” that actually produce DTA yet aren’t affected by it.
The beauty of the tunable switch — which Collins and his group described in a paper in Cell earlier this year — is that it’s a fully modular system, meaning it can be used to control any particular gene of interest, according to Collins. Since the paper came out, his lab’s phone has been ringing off the hook; most people who contact them are interested in using the switch for functional genomics studies. Collins and his team are using the switch for the same purpose, reverse engineering networks to identify genetic mediators in a variety of biological processes, such as aging.
From Sequence Alignment to Image Analysis, Breakthroughs at ASU
Like many an informatics authority, Sudhir Kumar developed his expertise simply because he needed a tool that didn’t exist at the time. As a researcher at Pennsylvania State University in the early ’90s, Kumar was looking for a program that would gather DNA and protein sequences and then estimate the relationship between those sequences — but there wasn’t such a tool, so he built his own.
That was the beginning of MEGA, which just this summer was released in its fourth version. When Kumar moved to Arizona State University, where today he directs the Center for Evolutionary Functional Genomics, “the emphasis became much more on developing a tool for scientists.” And so MEGA, or the Molecular Evolutionary Genetic Analysis tool, was improved and released openly to the community. Today, Kumar says, the tool has been cited in more than 6,000 papers.
One of the reasons it’s so popular, he says, is that it’s a program for scientists rather than bioinformaticists. “It’s really made for scientists at the forefront of experimental research to make it easy to analyze data,” Kumar says. The tool grabs data from the Web, GenBank, and a host of other resources — users can, of course, add their own data to it — pulling together sequences to build alignments and “estimate evolutionary divergence or sequence divergence,” he adds. MEGA’s output includes a natural-language description of the analysis and display that users see so that even people without a background in evolutionary biology can understand the results, Kumar says.
With the public launch of MEGA, Kumar officially began his career of building informatics tools that scientists could use to study functional genomics. The idea was to examine “genome function and species patterns through bioinformatics,” he says. Even today, he adds, “the biggest impediment is not the amount of data sometimes but rather its effective analysis. … We really have to come up with ideas and tools.”
To that end, another major focus in his lab has been the development of a library called FlyExpress. As bioimaging technology has become a more common element of studying function in genes, the need for better image collection and analysis tools has grown significantly. Kumar’s project is specific to gene expression in fruit flies, but he hopes that the approach can be translated to other organisms studied with imaging as well. “We are developing basically the fundamental tools for computational biology of spatial gene expression patterns,” he says. Scientists have for some years now stained genes in an organism and then watched through development to see where and when the gene lit up. It’s been especially helpful during embryogenesis, where a gene might be used for just a short time and then never turn on again.
At Kumar’s center, scientists have pulled together about 50,000 images of standardized fly embryos stained for some 3,000 genes. But with the popularity of high-throughput imaging, he says, he expects the FlyExpress library to grow to more than 150,000 images in the next few years. The system is online now, and researchers can use it to submit queries such as, “Tell me all the expressions of this gene over time,” Kumar says. Developing FlyExpress meant creating robust pattern-matching techniques that would accommodate images of an embryo that are taken from the top, bottom, or side, depending on the original scientist’s interest. “This is functional genomics, to me, based on image analysis,” Kumar says.
New Push for Proteomics Follows Better Genomics Technologies
Five years ago, the University of Zurich and ETH Zurich pooled their resources to launch the Functional Genomics Center Zurich, in part to provide services to the institutions’ scientists who were showing more and more interest in the emerging field. Today, says Managing Director Ralph Schlapbach, the center is a service provider like a core, but also conducts its own research.
“When we started in 2002, this was mostly gene expression analysis using microarray technology,” Schlapbach says. Now, while microarraying remains “one of the strongholds of our work,” he says, the largest part of the center’s infrastructure goes toward qualitative and quantitative proteomics. “Many of the projects have moved to the next step of the central dogma of molecular biology — people are now doing proteomics.” That includes protein abundance measurements, or protein expression monitoring, as well as studies of post-translational modifications and more.
The way Schlapbach sees it, the path in proteomics reflects the technology curve for genomics. “People are moving from classical gene expression to epigenetics, tiling arrays, ChIP on chip,” he says. “Resolution is getting higher in terms of gene expression and gene expression regulation. That’s parroted on the proteomics side.” Scientists have gone from qualitative to quantitative, he says, and are now zooming in to look at traits such as phosphorylation, glycosylation, and other PTMs.
The real challenge facing functional genomics is in getting these two sides to play well together, Schlapbach says. To really appreciate what’s going on in a system — and use that to elucidate function — scientists will have to marry the gene expression data with proteomics data. But “that’s way beyond our current knowledge of how to integrate and compare the data,” he says. The problem could get even more intimidating as next-gen sequencing is added to the mix: Schlapbach predicts that as ultra-deep sequencing becomes more accessible, that will be the source of gene expression data, rather than array platforms.
At his center, one of the biggest projects is trying to “unravel the full proteome of a model system — Arabidopsis, C. elegans, or Drosophila,” for instance, Schlapbach says. But the team isn’t just jumping in and hoping for the best. To make sure they’re acquiring high-quality data that they will be able to match up with gene expression data in the future, they’re building everything from the ground up, including their LIMS. Schlapbach says the rate-limiting step will be biological validation, which by current methods simply can’t produce data in the high-throughput fashion that other technologies churn out. Building good models and generating predictions in silico will be an important step in functional genomics, he says, but “all you do there you have to actually validate” before the finding can be really counted.
Ultra-Deep Sequencing Complements Reverse Genetics
Edwin Cuppen has a problem. Group leader of functional genomics and bioinformatics at the Hubrecht Institute, Cuppen has chosen reverse genetics as his path to functional genomics, and that carries with it an inherent challenge: after you perturb a gene, how do you know what phenotypic change you’re looking for to measure the perturbation effect?
But the advantage of reverse genetics is that it’s incredibly high-throughput, thanks to enormous collections of knockout mice, zebrafish, worms, and other organisms. So Cuppen perseveres. And along the way, he found that next-gen sequencing would be a critical tool in helping him look for function.
“The trick that we use is chemical mutagenesis to damage the DNA,” he says. After mutagenizing the founder animals — he has used this in zebrafish and more recently in C. elegans — his group takes DNA from each animal and sequences the exon of the gene of interest to check for point mutations that would indicate the mutagenesis had affected the gene in question. “That’s where we try to implement next-generation sequencing technology,” he says. Ideally, what the team would like to do is look not just at the gene of interest, but to scan every open reading frame to see which gene or genes were affected by the mutagenesis. For organisms more complex than C. elegans, the cost of sequencing is still prohibitive for the volume of coverage necessary for that kind of effort.
Cuppen, who has made use of this approach in scanning for chemical deletions in addition to point mutations, says, “Currently, we are limited in generating genome-wide collections of knockouts of genes.” Another hurdle, of course, is the curse of reverse genetics: “In most cases, you don’t know what could be the phenotype,” he says. “That’s definitely a challenge.”
The Next Generation: Functional Genomics in Education
You know a field has made its mark when universities start offering degrees in it. A number of functional genomics degree-granting programs have begun to offer students education in this emerging discipline.
One example is at North Carolina State University, where the school pulled together courses offered through a number of departments into a functional genomics program that can lead to a master’s or PhD degree. With training grants from NIH, the school can support about 20 students per year in the program (which also has a bioinformatics half). But it’s become so popular that “we now have about 300 applicants per year,” says David Bird, director of the functional genomics part of the genomic sciences graduate program.
He says courses are based on theory and tend not to be technology-driven; the goal is to prepare students who will eventually work in industry or start their own labs with a functional genomics focus. “We can RNAi out every gene in C. elegans, and we still don’t know the function of half the genes,” he says.
A much smaller degree program is based at the University of Maine, operated in conjunction with the Jackson Laboratory and the Maine Medical Center Research Institute. “We are a large state with a small scientific community,” says Barbara Knowles, director of the PhD Program in Functional Genomics. So the institutes decided to team up and put a joint educational effort in place. “All the courses in our curriculum are taught on [interactive] TV,” Knowles says. “Students have to rotate in three labs … and at all three sites.”
There are two students in the program right now, she says, and several past graduates are getting their start in the research world. It is funded by the National Science Foundation as part of its Integrative Graduate Education and Research Traineeship program.