By Steve Nadis
Some scour the genome looking for associations between errant genes and disease. But Matthew Meyerson, a pathologist at Harvard Medical School and the Dana-Farber Cancer Institute, has a different idea — using the human genome to track down the bacteria, viruses, and other parasites responsible for illnesses. Here’s how it works: Take a specimen of diseased tissue, sequence its DNA, and eliminate any sequences that match the human genome. What’s left is non-human DNA that hopefully includes traces of the infectious agent. The beauty of this approach, which Meyerson calls computational subtraction, is that it shifts most of the heavy lifting from researchers to desktop processors.
Meyerson hit upon the idea last year while searching a human DNA database for sequences similar to a bacterial gene he was studying. He found three DNA sequences that matched well, but discovered they were not part of the human genome at all. Instead, they were bacterial sequences mixed in with the human sample.
As a test, his group took a library of 3.2 million ESTs collected from various human organs, both healthy and diseased, and compared it with seven databases to subtract the human component. About 90 percent matched and were thus eliminated. Software by Griffin Weber and Jay Shendure, MD/PhD students in Meyerson’s lab, automatically subtracted the sequences. They also compared the library to the mouse genome in the hopes of screening out sequences that might be found in unfinished sections of the human genome. Five percent of the remaining sequences were deleted this way. A match was arbitrarily defined as a BLAST score of 60 bits or higher, corresponding to an alignment of roughly 30 bases in a row.
It took a desktop computer two weeks to whittle down the original list of 3.2 million to 65,000 sequences with no apparent link to the human genome. Further analyses showed that some of these sequences were from infectious bacteria, viruses (Epstein-Barr, human papillomavirus), protozoa, and fungi.
In a subsequent experiment, Meyerson and Yaohui Xu took a biopsy from a lymphoma patient and sent cDNA from the sample to Eric Lander’s group at the Whitehead Institute for sequencing. Of the 27,000 sequences that were read, 40 were deemed non-human by computational subtraction. Of those 40, 10 were from the Epstein-Barr virus strain thought to cause the disease.
“Of course, the real proof will be using this technique to discover an unknown pathogen,” Meyerson says. To that end, he and his colleagues are working with samples from patients with Crohn’s disease, Hodgkin’s disease, and multiple sclerosis — all disorders of mysterious origin. If this strategy works, it could represent a big advance over conventional methods. Traditionally, researchers identify pathogens by growing them in a dish from a sample of infected tissue. But not all organisms can be grown in culture. Another technique called experimental subtraction involves physically removing sequences present in both the sample and a control. “That’s much more laborious than doing it on computers, and more bias- prone,” says Meyerson.
Meanwhile, his team is working hard to refine its computational tools. First, they’re creating better filters that take into account the quality or confidence level of each base call in a sequence. New filters will discard sequences that are too short to be deemed reliable and others that look spurious. The method will also improve as the human and mouse genomes are filled in. Weber sums up the effort this way: “A biolab can’t look at 3.2 million sequences, but they can look at a few dozen. So we’re trying to chop that number down as much as we can.”