NEW YORK, Jan. 30 (GenomeWeb News) - When a researcher at Penn State University offered Dan Peterson the chance to use a Genome Sequencer 20 for his pine genomics research, he jumped at the chance, and is glad he did.
Peterson, director of the Mississippi Genome Exploration Laboratory at
This type of information will help scientists further define and refine the tree of life, and could be a boon for research into evolution. "There is a lot to be learned from this in terms of understanding the evolution of an organism," Peterson told GenomeWeb News.
All the plant genomes that have been sequenced so far are flowering plants, angiosperms. The gymnosperms, which include conifers, diverged from angiosperms 130 million years older. "One of the things we are trying to get at it is understanding how the genomes of different gymnosperms have changed in comparison to angiosperms," Peterson said. "This is one group that has not been very well explored and one of the major groups on earth."
Sequencing the pine genome is a difficult task: With 21 billion base pairs, it is seven times the size of the human genome. Its mammoth length is due mostly to the large number of repetitive DNA sequences, which do not code for genes but are essential to the evolution of organisms. "Most eukaryotic genomes are mainly repetitive DNA," explained Peterson. "Pine is one of the extreme examples."
Peterson has coupled the 454 instrument with Cot analysis, a technique pioneered in the 1960s that provides a way to separate repetitive DNA sequences from the single- and low-copy "gene rich" regions of the genome. If DNA is denatured by heating and then cooled, sequences will begin to reassociate with complementary strands.
"When that happens, the most common sequences, the ones that are repeated, will find each other and form double stranded DNA faster than the ones that are single copy or low copy," explained Peterson. "The double-stranded repetitive DNA can then be separated from the single-stranded low-copy DNA by hydroxyapatite chromatography."
In 2002, Peterson, a postdoc at University of Georgia, and his postdoc advisor Andrew Paterson resurrected this old lab technique and showed how Cot analyses could be coupled with DNA cloning and high throughput sequencing to efficiently elucidate unique sequence information in a genome. Since then, Peterson has been using his Cot-based cloning and sequencing technique to identify and study gene-rich regions from large genome species.
The 454 sequencer has allowed Peterson and fellow pine researcher John Carlson at Penn to sequence isolated repeat and single/low copy components to relatively high coverage without having to do prior DNA cloning. To date, Peterson and Carlson have used the 454 instrument to sequence 28 million base pairs of random genomic DNA, 22 Mb of highly repetitive DNA, 20Mb of moderately repetitive DNA, and 31 Mb of single/low copy DNA.
"With the 454 sequencer, we decided to fractionate the pine genome into highly repetitive, moderately repetitive, and a single/low copy components, and use the 454 to sequence these different components," he said.
According to Peterson, one of the most interesting things to come out of the work is based on the extremely high sequencing coverage obtained for repetitive regions. "You can use contigs assembled from the 454 reads to study the evolution of repetitive elements," he said. Many plant repetitive segments, for example, are retroelements, sequences that evolved from or perhaps gave rise to retroviruses. They have replicated over and over again," Peterson said. "And over time, they evolve and kind of drift, so the copies are not as similar as they were originally. You can actually determine the age of these elements based on how much divergence you see in them."
Sequencing repetitive sequences will also help researchers identify genes. "If you know the repeat sequences, you can make sure that you avoid using them in marker developments and physical mappings as they will produce confusing uninformative results," Peterson explained.
Peterson's repeat evolution work is one area of research that would have been too costly to pursue without a next-generation sequencer. Others share the sentiment that next-generation instruments will enable scientists to explore novel areas of research.
Garth Ehrlich, executive director of the Center for Genomic Sciences at Allegheny-Singer Research Institute, recently said the Genome Sequencer 20 has enabled him to study bacterial transformability. "It completely changes the kind of questions you can ask," Ehrlich said.
Peterson points out that to look at the repetitive sequences, his lab couldn't use the assembly software provided by 454. This software is aimed at assembling sequences from bacteria, viruses, and other prokaryotic organisms; it's not geared for assembling sequences from eukaryote genomes that have a lot of repetitive DNA. "What 454 has said is, 'We are not going to spend a lot of time developing this new software to go with the instrument.' I can understand that because there is other software out there," said Peterson. "They have said that anyone who wants our basic software can have it, we'll give you the code, and you can do with it what you want."
According to Bill Spencer, director of worldwide system sales at 454, the company is currently developing a large genome assembler. "The software's current capacity is limited to approximately 7 to 7.5 million 454 reads, which limits the full genome assembly to bacterial and fungal genomes as well as sections of larger eukaryotic genomes, such as the assembly of BAC sequences."
Peterson said he is at the beginning of his sequencing project, and there are many who are eager for its results. Loblolly pine is the number one crop in the southeastern
Kate O'Rourke covers the next-generation genome-sequencing market for GenomeWeb News. E-mail her at [email protected].