Genomics at the NIH is not limited to NHGRI. In fact, when Eric Green, director of the NIH intramural sequencing facility, heard that GT wanted to provide an overview of genomics, proteomics, and bioinformatics across the various institutes he said, “I’m really surprised you are trying to tackle the whole campus. You’re never going to get it all.”
He was right. More than 1,000 labs and 17,000 full-time employees are working on at least 2,000 research projects, mostly within the 75 buildings on the 300-acre main campus in Bethesda, Md. There are additional campuses in Research Triangle Park, NC; Baltimore, Md.; Frederick, Md.; Hamilton, Mont.; and Phoenix, Ariz. Virtually each of the 27 institutes and centers has at least some genomic component. Many of the institutes, for example, are conducting large-scale SNP association studies for a particular disease area. The National Institute of Environmental Health Sciences has begun an environmental genome project, which is looking at genetic variations that increase susceptibility to environmentally associated diseases. And the National Institute of Diabetes and Digestive and Kidney Diseases is in the planning stages of its Diabetes Genome Anatomy Project.
An NIH intramural research budget of $2.6 billion may sound like nothing to laugh at. But that’s less than $2.6 million per lab on average, not a lot of cash when you consider the grand scale of many of the projects. It’s no surprise, then, that NIH has become a hotbed of technology development — researchers looking to create ways to speed things up, cut costs, and figure out ways to pool resources. The following pages offer a look at just a few examples of innovative genomic technology development and approaches. But first we offer an interview with NHGRI director Francis Collins, in which he tells GT readers about the importance of supporting tech development.
Francis Collins has been directing the National Human Genome Research Institute since the days when it was just a center, not even an institute. Aside from the glory Collins has basked in over the past year for his role in the Human Genome Project, he is also recognized within the biological research community for establishing the NIH intramural research program in genome research, now one of the premier research units in the human genetics in the US. It’s his own NHGRI lab, Collins says, that keeps him sane. He heads a team of 10 researchers that looks for genomic links to type-2 diabetes.
Collins holds a PhD in physical chemistry from Yale and an MD from the University of North Carolina. He occasionally commutes on a Honda Nighthawk motorcycle from his home in Bethesda to his office in Building 31 on the NIH campus. There, in a 45-minute interview with GT editor Adrienne Burke one morning in May, Collins spoke about supporting genomic technology developments and the H in NHGRI.
First, could you describe your role at NHGRI and within NIH?
COLLINS: Something like 85 percent of the NIH budget goes to the extramural community, to our grantees. My main job as director of NHGRI is to oversee that enterprise, to make sure that we are setting ambitious goals, that we are coming up with ways to attract the best and brightest scientists to apply for support to achieve those goals, that we are carrying out the most rigorous, most forward-looking kind of peer review possible, and that we’re nurturing those projects to succeed. Of course, that’s what the Human Genome Project has been all about for the last 12 years.
The most visible part of NHGRI is this extramural funding, but we do more than fund. We serve in a very hands-on way in terms of scientific management, particularly for the genome sequencing, but also for many other aspects like the haplotype map that we’re now getting organized.
What’s the overall budget you’re managing?
COLLINS: The budget for NIH as a whole is $23 billion for the current fiscal year. The overall budget for the genome institute is $429 million for 2002. So we’re about two percent of NIH. And 85 percent of that goes out in grants to institutions all over the country, including the big genome centers at Whitehead, Wash U., and Baylor, but also to hundreds of grantees in institutions that read like the Who’s Who of exciting research in the university environment. We also give grants to biotech companies that are pushing the envelope on technology.
What are your priorities in terms of technology development right now?
COLLINS: (Pulling what looks like a yellow rectangular computer chip about the size of a microtiter plate out of a CD jewel case) Here’s something created by investigators at the University of Michigan through a large grant from us. This would be an example of a MEMS — microelectromechanical system sequencing machine. This is a sequencing instrument built by David Burke and Mark Burns, Burke being a geneticist, Burns being an engineer. Basically what’s engraved onto the surface of this chip are all the components necessary to do a sequencing reaction, including the thermal cycler to allow you to do the cycle sequencing reactions for all four bases, the ability to clean up the product and then run an electrophoretic separation, and the detector to see what comes off the end. And it’s looking pretty encouraging, although at the moment they cannot yet get the kind of resolution with this that you can get with the 3700s. So it’s not quite ready yet for prime time.
We’re also supporting genotyping technology development, recognizing that everybody wants genotypes dirt cheap as soon as possible, and that the current methods are interesting but none of them has broken the sound barrier as far as the cost. We want to get down to the point where you can do a genotype for perhaps .01 cents, not the current 10 cents. If we’re going to really tackle whole-genome association studies with this haplotype map, we’re going to have to drop the costs very substantially.
Is there a percentage of that 85 percent for extramural funding that is dedicated to technology development as opposed to research and sequencing?
COLLINS: In our extramural portfolio we spend about $25 million on technology development, but it varies from year to year and, of course, it’s a function of opportunity. It is the way in which we’re trying to invest in the future to make sure that we don’t get stuck in the current mode of sequencing, genotyping, expression analysis, or proteomics.
A fair amount of this technology development is going to be invested in proteomics. Our view of proteomics is that this is an enormous challenge to try to tackle the protein repertoire of a mammalian cell, and we don’t really as yet have the technology tuned to the point where that could be done in a cost-effective way. 2D gels and mass spec are great, but we’ve got to do better than that if we’re going to see more than five percent of the proteins in a cell.
So we also have a large interest in trying to improve the technology for doing proteomics in mammalian cells so that not only can you characterize what proteins are present and where they’re located in the cell, but what they interact with, what post-translational modifications they have, and so on.
Are there specific examples of new proteomics technology that you’re supporting?
COLLINS: Sure. Look at the work that’s going on in Mike Snyder’s lab at Yale, that is one of our Centers of Excellence in Genomic Science. That program, CEGS, is where a lot of our investments in the future are going to come from. Mike has one of the first ones of those. This is a large research center that involves multiple investigators focused around a theme, in this case the theme being proteomics and in particular protein chips.
Mike’s work involves characterizing proteins in a global way by putting them down on chips and doing biochemistry on the chips. In his case he’s putting the proteins on the chip, not the affinity reagent on the chip — that’s another very important aspect of proteomics that we would like to see happen, but the affinity reagents aren’t quite there yet. What he’s doing is quite exciting. He puts, for instance, the entire yeast proteome on a chip and in one experiment figures out which of these are kinases.
As another example, we fund Marc Vidal in Boston, who is doing a very ambitious protein-protein interaction experiment for all of the proteins in C. elegans — basically taking the two-hybrid system and scaling it up to a matrix that has 19,000 proteins on each side and trying to find out what interacts with what. This involves a lot of robotics and automation and also a lot of good computational work. A lot of what we fund in technology development has a heavy bioinformatics component, otherwise it’s not really going to give you usable data.
Of course, we also have an even larger amount that’s funding production efforts to generate large data sets that we think will be of value to researchers all over the place.
Do you mean production sequencing?
COLLINS: That’s one component. The human genome, of course, is our flagship and we will have that finished by April 2003. There are international critical and substantial contributions to that effort, but NIH has taken the lead all along in terms of organizing and implementing the plan. This is one of my most important jobs — to serve as the project manager for getting the sequence done.
We met at Cold Spring Harbor a couple weeks ago to see where we are at, and, thanks to the incredible hard work and devotion of the 16 centers in six countries, we are on track to get the sequence done by April 2003.
The mouse sequence is another very important production project, and that one we’re doing jointly with the Sanger Institute with funding from the Wellcome Trust. In just the last two or three weeks [we released] the mouse draft assembly. It’s fantastic. I’m getting dozens of e-mails a day from people saying, “This is the most amazing thing.”
The first week in May [the mouse assembly] went up on the websites for Ensembl, NCBI, and Santa Cruz. So now it’s not just the raw reads, which are helpful, but frankly not all that user-friendly for a lot of users. Now it is an assembly, and it is a beautiful assembly. The scaffolds are ordered and oriented; DNA sequence contigs are on the average 16 megabases.
This is much more contiguity than I think our wildest imaginings would have thought possible. That’s why people are excited. Now if they’re doing a positional cloning project, or if they’re looking for the anatomy of their favorite mouse gene, they’ve got it. We have to go on and finish this just like the human. We don’t want to leave it with gaps, and there are plenty of gaps there, but as far as usability this reaches a much higher standard than had been expected. In fact, from what I’ve heard, this assembly reaches a higher standard of contiguity than what has been available previously from Celera, and yet it’s free. We’re very happy with that and it was a lot of work on the part of the sequence producers.
Last question: Once the human genome is done, will you drop the H from NHGRI?
COLLINS: No, not at all. Remember, this is not the Human Genome Sequencing Institute. This is the National Human Genome Research Institute. And the need to do research on the human genome is now greater rather than less, because we have the full sequence to work on. We’re not exactly in the post-genome era, despite the way people like to throw that term around. I think we’re finally in the genome era.
Hey, Got a Tissue?, NCI/NHGRI
Can you get a better deal than this? The Tissue Array Research Program, a collaboration between NCI and NHGRI, offers arrays of 500 spots of tissue for $20 a slide. The arrays contain samples from breast, prostate, colon, lung, brain, and ovarian tumors, as well as melanoma, lymphoma, and normal tissue embedded in paraffin.
More than 300 researchers use the tissue arrays, says Stephen Hewitt, an NCI researcher who runs the TARP lab. The arrays require less material and reagents than traditional tissue sections and allow researchers to study gene or protein expression, localization, and pathway discovery within a biological context in parallel.
The Cooperative Human Tissue Network, an NCI-funded tissue bank, provides TARP with the tissues and also handles the distribution. “They keep the cash and it covers the cost of my tissue,” Hewitt says.
The TARP array, first developed in 2000, is now in its fourth generation. “Usually about 15 percent of the tissue gets replaced from generation to generation,” says Hewitt.
He is also about to launch a new array containing 58 cell lines from the NCI 60, a collection of 60 cancer cell lines often used as a reference standard in research. “There were intellectual property problems with two of the cell lines,” says Hewitt, explaining their absence. “But they don’t impact the array greatly.” The price for the new cell line chips has not yet been set, but Hewitt says he is trying to keep them below $50 a piece. Commercial arrays are available as well, but are much more expensive. “Even if I raise my prices, I’m going to be cheaper by a long shot,” says Hewitt. “Many of the commercial tissue arrays are between $200 and $400 for two slides. And they do not offer the numbers of tumors that I offer.”
He also teaches those interested how to build tissue arrays on their own — NIH grantees regularly visit his lab for training. Aside from TARP’s standard array releases, Hewitt creates custom tissue chips for NIH researchers. “If you could put it in wax, I could probably put it in an array.”
cDNAs, Cradle to Grave, NIA
More than 200 microarray facilities worldwide use Minoru Ko’s cDNA clone set. What’s so special about them? While other cDNA sets are derived from adult tissue, “Our collection includes genes from very early-stage embryos,” says Ko, chief of the National Institute on Aging’s developmental genomics and aging section.
The NIA 15K mouse cDNA clone set, as it is known, represents 15,264 unique genes, many of them novel.
A network of 10 centers in the US, Canada, Europe, and Japan distributes his library. Ko’s clones are not limited to research of early development. “The clone set is large enough to cover not just embryo-specific genes, but also all the other genes,” he says. “So people are using it as a general array clone set, too.”
Ko is now getting ready to release another 7,400 clones. “And there is no overlap between the new and prior sets,” he says. “When you think of the total number of genes in the mammalian genome, we’re pretty close to it.” To meet the demand expected with the new release, Ko plans to enlist another 10 centers to copy and distribute the clone sets.
The main challenge is collecting enough genetic material from a 60-micron pre-implantation mouse embryo to create full-length cDNA libraries. Ko solved this problem by designing a new linker primer that allowed the construction of cDNA clones from sub-microgram amounts of RNA.
NIA is particularly interested in embryonic genes because “many things in effect during aging are already determined during development,” says Ko. Also, understanding gene expression of embryonic stem cells may potentially provide clues on using those cells to treat diseases associated with aging, such as Alzheimer’s and Parkinson’s.
Simple SNPs, NIAAA
When the National Institute on Alcohol Abuse and Alcoholism decided to screen large populations to determine the influence of genetic variation on addictive disorders such as alcoholism, Robert Lipsky, the institute’s molecular genetics section chief, began searching for a sensitive, simple, yet inexpensive SNP discovery method. The existing popular technologies, gel-based and DHPLC methods, all relied on the same principle: heteroduplex DNA that contains a polymorphic mismatch melts at a lower temperature than perfectly matched DNA. Instead of using methods that indirectly measure this thermodynamic property of DNA mismatches, thought Lipsky, why not measure the melting rates directly?
And thus DNA Melting Analysis, or DMA, was born. Lipsky simply reconfigured an Applied Biosystems 7700 TaqMan, designed for real-time PCR, for SNP discovery.
Here’s how DMA works. First, as in the other methods, the target DNA is amplified with PCR. Then the sample is melted and reannealed to allow the variant DNA to recombine with the wild type allele and form a mismatched heteroduplex. “Then you’re going to heat it again, all in the same tube. No transfer,” says Lipsky. “The mismatches will melt out more rapidly at a lower temperature than a perfect match.” To monitor the rate of melting, Lipsky uses a dye that fluoresces only when bound to double-stranded DNA. “So as you melt it the fluorescence decreases,” he says. Lipsky then analyzes the raw data in Microsoft Excel.
“The advantage is that it’s simple,” says Lipsky. DMA avoids messy and labor-intensive gels and the expense of columns, solvents, and buffers of DHPLC. It also creates less hazardous waste. “With DHPLC, if you are doing a screening project like this, you’re talking about liters and liters of chemical waste,” Lipsky says.
The response to the Clinical Chemistry paper that introduced the method has been phenomenal, he says. “We’ve gotten three or four reprint requests a day.”
To further develop the technology Lipsky is looking to form a CRADA partnership (see sidebar, p. 58) with a commercial instrument maker. “We’ve been talking with a couple of companies that actually have instruments that could be adapted for this,” he says. “I’d like to end up with a box that will allow a laboratory to do variant screening at a relatively small cost.”
Cheap Chips, NIDDK
How do you make hundreds of identical protein chips at a time without an arrayer and for pennies apiece? With Scotch tape, Jell-O, and a freezer.
“At NIH we’re limited by space and money, and the institute was not allowing us to buy any robotics,” says Robert Star, chief of renal diagnostics and therapeutics at the National Institute of Diabetes and Digestive and Kidney Diseases.
Arrayers can cost as much as $80,000 a pop. And spotting the proteins by hand limits the reproducibility of the chips. “That’s why we came up with this idea of being able to make up to 800 identical slides for almost no money,” says Star.
To create the arrays, Star, postdoc Takehiko Miyaji, and NCI collaborators Lance Liotta and Stephen Hewitt freeze protein solutions into cylinders, cut them into slices like salami, and transfer them onto a glass slide.
First they stick a series of 23-gauge metal pins, two millimeters apart, into a block of gooey embedding material, freeze it, and then pull the pins out, leaving an array of tiny cylindrical holes. Next they pour various protein solutions, each into a different well, and freeze the gel block again. Then the thrifty researchers cover the surface block with a strip of Scotch tape and cut a 10 micron width slice off the top. They then transfer the tape, protein side down, onto a glass slide. When they remove the tape, the spots of frozen proteins remain on the slide. Each slice off the frozen block produces another protein array.
“The tricky part turned out to be getting these frozen protein sausages to stick to the block,” says Star. Because the samples and gel froze at different rates, they wouldn’t bind, and the protein dots would drop out. It took the researchers four months to solve this problem — the answer turned out be Jell-O. “The Jell-O keeps things in place,” says Star. “It also gives us color.”
The NIH has applied for a patent on the method, called cryoarrays, and it is up for grabs for interested licensees.
So Many Species, So Little Time, NHGRI
The human is nearly done. So is the mouse. So what’s next for NHGRI? With the cost of complete genome sequencing in the hundreds of millions of dollars, the institute must make some tough decisions. Researchers worldwide are campaigning for their favorite organisms, among them dog, chimp, baboon, macaque, cow, and chicken.
“The problem is all these people have opinions and there’s no preliminary data,” says Eric Green, director of the NIH intramural sequencing center. “The genome project wants to continue sequencing genomes, but we don’t know what to sequence and in what order.”
So Green’s lab has undertaken what he calls a reconnaissance mission. To collect data for prioritizing genomes, Green has selected about 40 well-defined regions of the human genome, ranging in size from half a megabase to five megabases, and is sequencing these same regions in up to 20 other species. Researchers can then compare the sequences to see which genomes are likely to yield the most new information.
“You could simply ask the question: given dog and human, does cat help you a whole lot more? If the answer is no, then maybe you want to make that a lower priority,” says Green. Another consideration could be getting genomes that are at distinct evolutionary points.
The project is also driving computational biologists to develop tools to compare as many as 20 species at once. “Right now such tools are in their infancy,” Green says.
What really matters in the end, though, is choosing “the best species to sequence in order to help decode the human genome,” Green says.
Mobility Matters, NIDA
To identify proteins, measuring molecular weight with a mass spectrometer will do just fine. But not for the work of Amina Woods of the National Institute on Drug Abuse. “Molecular weight is not enough information,” she says. “If you have a lipid, a DNA, and a peptide that have the same molecular weight, you would only see one signal with a regular mass spec.”
And since drugs of abuse interact with all three biomolecules, a MALDI-TOF alone doesn’t cut it. To solve this problem, Woods has turned to a new kind of mass spectrometer, a MALDI-IM-TOF. Developed by Kent Gillig at Texas A&M in collaboration with a small Houston company called Ionwerks, the instrument is like a MALDI-TOF, but with an additional chamber — an ion mobility cell.
The electrically charged, helium-gas-filled IM cell separates ions in the gas phase according to their volume-to-charge ratio. DNA drifts through the cell the fastest, followed by peptides. Lipids chug along at the slowest rate. “It separates according to the shape of the molecule, its composition, and its conformation,” says Woods. “So instead of having just one spectrum, because the molecules move differently in the gas, they come out at different times.”
Because the instrument measures biochemical compounds in two dimensions — first volume, then mass — it can potentially also resolve the components of intact bacteria and viruses and determine each molecule’s molecular weight.
So You Want to Sell to the NIH?
“Some vendors have a very naïve idea that somehow if they break into the NIH, they’re going to all be rich and retire early,” says Claire Driscoll of the NHGRI technology transfer office. “It’s just not the case because it’s too diverse.”
Unless you’re selling something that everybody needs — say, pipette tips — it’s best to think of each lab as a distinct customer. Purchases on specialized items are usually driven by individual researchers, so pitching to administrators will get you nowhere.
A great opportunity for exposure to a broad intramural audience is to exhibit at the NIH Research Festival, an annual on-campus, institute-wide conference where staff researchers get to showcase their work. This year’s festival is scheduled for October 15 to 18, and exhibitors hawk their wares on the final two days from booths under large tents.
"The research festival is a really big deal,” says NCI investigator Stephen Hewitt. “Exhibitors fight over space because they are inundated with people who are going to look at equipment, technology, reagents, and supplies.” Last year more than 400 exhibitors displayed their state-of-the-art equipment, supplies, and services.
Once a researcher is interested, the process begins. If the product is more than $2,500 it must be opened for bidding. Preference is given to small and women-owned businesses. Even if they are underbid, they can still get the contract.
Other factors are considered as well. “They ask you if you have foreign components in your system,” says Geospiza CEO Todd Smith, who recently licensed his company’s Finch-Server product for DNA sequence management and analysis to the National Institute of Allergy and Infectious Diseases. (Made in the USA is preferred.) Another possible factor: the American Disabilities Act. For example, using color to represent data in software may actually be a disadvantage if there is a comparable package that doesn’t rely on color, so as to give color-blind researchers equal access.
The process can be drawn out over many months. “Sometimes it’s frustrating. I had to fight with my management just to convince them that it was worth the effort,” says a genomics software company sales manager who requested anonymity. “We could have easily just walked away and said we don’t sell to government because it’s too much of a hassle. And that was a decision we almost made.” Also, be prepared to adapt your contract to fit government guidelines.
The best way to win sales is to wow the researcher with published data and scientific presentations at conferences. “If the scientist really wants a piece of equipment because they’re impressed, they can justify it based on the science and write it up, and everybody else along the administrative chain will go with it,” says Driscoll.
Licensees and Collaborators Welcome
To get access to technology developed at the NIH, there are two ways to go: license the technology and develop it independently, or collaborate with the researchers and work on the technology together.
Exclusive licenses are rare. “Normally in government, when there is an invention we tend to offer it up to everyone,” says Claire Driscoll of the NHGRI tech transfer office. “Unless there is some compelling business reason — a therapeutic or a vaccine where you generally need an exclusive to really be able to do the R&D and to attract the capital you need. But everything else is a non-exclusive and it’s pretty much up for grabs.” Technologies available for license are published in the Federal Register, available online at www.access.gpo.gov/nara. You can also search inventions by keyword on the NIH Office of Technology Transfer website: http://ott.od.nih.gov.
The mechanism for collaboration is entering a Cooperative Research and Development Agreement. With a CRADA, in return for providing research funds, the company has the option to exclusively license inventions developed either jointly or independently by NIH scientists. “Even if the company was working together on the project but it just so happens the invention was made independently by the government side, that company still gets an option to negotiate an exclusive license,” Driscoll says. The commercial partner also maintains control over what the NIH researchers can publish or present at scientific meetings.
Each institute’s tech transfer office negotiates its own CRADAs. But it won’t do any good to contact the tech transfer office directly if you are interested in forming a CRADA, warns Driscoll. “You have to make your pitch to the scientist. I don’t make the pitch for you,” she says. In other words, do your homework: identify the researcher you’re interested in collaborating with, and convince him or her of what you have to offer.
“If an individual scientist or principal investigator says, ‘Well, this company is doing this project, they’ve approached me, and I’m interested,’ then we negotiate the CRADA.” Driscoll says.