By Aaron J. Sender
Brrrrr. There’s a draft in here. Oh, it’s the human genome. While researchers have proclaimed that April 2003, the semi-centennial of the double helix, will also be the deadline for turning the draft into a complete sequence, many holes remain. Automated gel electrophoresis has done a good job at getting the bulk of the genome, but what of those extensive repeat and GC regions?
Xian Chen of Los Alamos National Laboratory has an answer: mass spec. “Gel artifacts will introduce a lot of errors in DNA sequence data,” says Chen. The bases in GC-rich regions, for example, cross link to form rigid secondary structures, intractable to gel-based sequencing. And gels lack the resolution to definitively distinguish between, say, 11 consecutive Cs and 12. “In order to find out exactly what the ambiguous sequences are, we need to use a more accurate way to validate them,” says Chen.
Chen came upon the idea of using isotope-labeled nucleotides and a mass spectrometer for nailing down sequence while studying DNA structure with NMR. He was replacing all the naturally occurring carbon-12 with carbon-13 and nitrogen-14 with nitrogen-15 to increase the magnetic resonance signal for those atoms. “Then I realized that this kind of stable-isotope labeling can also generate a mass change without changing the structure or chemical and physical properties of molecules,” says Chen.
Each C, for example, with heavier carbon and nitrogen isotopes substituted has exactly 12.1 Daltons greater mass than normal nucleotides. Chen uses this mass difference to solve ambiguous sequences. For example, he obtained a GC-rich region of chromosome 19 from the Joint Genome Institute. Through repeated resequencing with gel electrophoresis, JGI’s finishing team determined that the sequence had a string of 10 to 12 Cs, but could not pin down exactly how many. When Chen introduced the labeled Cs, the mass shifted 132.7 Daltons, or 11 bases.
“The mass spec can deliver an answer within seconds, while a gel you have to run overnight,” says Chen. “And for these kinds of errors they can never pick that out, no matter how much resequencing they do.”
Chen further speeds up the process by running both the labeled and unlabeled samples through the mass spec together, bypassing the need to calibrate the instrument. “In our case because we are looking for the difference, we don’t need to calibrate the spectrum,” he says. “We can just do one measurement and know the mass shift and then immediately we know the content of the sequence.”
The mass-tagging approach can also be used to determine the identity of a single base within a sequence. To prove it, Chen tackled a difficult region in chromosome 16 that contained a single base that JGI was unable to assign. The mass spectra pointed to an A that was accounted for.
For now, Chen is going through one unfinished fragment at a time. “But we are trying to standardize the assay for high throughput,” he says.
The great thing about the mass-tagging approach, says Chen, is that it is simple enough for any small lab to do. “If they have a MALDI-TOF, they’re ready to go,” he says. Even a lower end instrument will do. In fact, Chen collects his spectra on a simple ABI Voyager. The method is also cost efficient. The required labeled nucleotides come to about 16 cents per mass-spec measurement.
“It’s very important to validate the quality of these sequencing data and close the gaps,” says Chen. An unassigned base, for instance, may be an important disease-related SNP. Or miscounting the number of nucleotides in a string of repeats can shift the reading frame. Gaps in the sequence may also cause misinterpretation of protein sequences. “And other methods cannot compete in terms of accuracy, specificity, efficiency,” says Chen. “This is going to be a very powerful method for sequence validation.”