The recent completion of a three-fold coverage of 95 percent of the mouse genome opens up immediate opportunities for comparative genomics and new approaches to mouse modeling experimentation, researchers said last week.
The Mouse Sequencing Consortium announced on May 8 the public availability of the raw data, some 15 million individual unique sequence traces, on the new NCBI Trace Archive (www.ncbi.nlm.nih.gov/Traces/trace.cgi), which began receiving data approximately six months ago, as well as the Ensembl Trace Server (http://trace.ensembl.org). Both sites assist researchers interested in doing their own assemblage of the genome or engaging in a large-scale genomic analysis.
As of May 11, 39 Mb of nonredundant finished sequences and 320 Mb of unfinished sequences obtained in a clone-by-clone manner were available in GenBank, while approximately 9 Gb of whole genome shotgun data for the mouse were available in the Trace Archive, according to Deanna Church, an NCBI research fellow.
The rate of acceleration to a higher sequence resolution is occurring at approximately one thousand clones a month, said Jim Ostell, chief of the NCBI information engineering branch, “which means the mouse is going to a high level of quality in a year or less.”
In the meantime, the archive is not just being used by those sequencing the mouse genome, but also by researchers annotating the human genome.
“The mouse is at an interesting position,” said Jim Kent, a University of California, Santa Cruz, researcher. “In many ways, you learn different things from different evolutionary distances from the human. For example, a French group started sequencing the puffer fish. They saw exons conserved, and [at this distance] you’re quite certain it’s real. When you go to the mouse, you start to be able to detect regulatory regions. At the coding exon level, you start to see more exons than the puffer fish,” approximately 80 to 85 percent, estimated Kent.
Kent and others — including teams at the University of California, Berkeley, Washington University, the Sanger Center, and Columbia University — are designing new algorithms to align the traces to annotate the human genome. These “gene predictors,” according to Kent, “offer a very strong signal” to predict the structures of genes. They may also cut down on the computational time needed to do analyses, Church pointed out. Applying new algorithms to the growing data will also “let us find about 20 percent of human genes that otherwise we would have missed,” explained Kent.
The currently available clone-by-clone sequences can also be used to fashion new mouse models for experimentation. “People can take clones to make transgenics or make a knock-out of a gene you’re interested in,” said Church. “It’s quick and easy for mouse researchers to translate that data into their labs.”
Kent, while immersed in the mouse data, is also looking to the next genomes for more information on regulatory coding regions. “The lemur might be ideal,” he speculated. But “there is something to be said for sequencing the chimpanzee. Almost everything will be conserved. There you start looking for what is not conserved, for what makes us uniquely human.”