COLD SPRING HARBOR, NY, Oct. 31 (GenomeWeb News) - It's technically possible to computationally reconstruct the genome of the ancestor of all placental mammals, according to David Haussler of the University of California, Santa Cruz, who is spearheading a collaborative effort to deliver the assembly of such a genome to the research community.
Haussler, a professor of biomolecular engineering at UCSC, said that an effort to "reconstruct the evolutionary history of each base in the human genome" from the time of the so-called Boreoeutherian ancestor, which lived around 75 million years ago, is "the grand challenge of human molecular evolution."
Speaking Saturday at the annual Genome Informatics conference here at Cold Spring Harbor Laboratory, Haussler outlined several pilot studies that he and his collaborators have conducted as proof of principle for such a project. The group published a paper in the December issue of Genome Research describing the use of comparative genomics to computationally reconstruct the CFTR locus, which encompasses more than 1 million base pairs and includes 10 genes, including the gene involved in cystic fibrosis.
Since then, Haussler said, he and his collaborators - including Webb Miller at
Work so far "indicates feasibility," Haussler said, with an overall accuracy of around 91.5 percent - a number he hopes to bring up to around 98 percent.
Haussler acknowledged that there is some skepticism about the accuracy of the reconstruction. But he said that he is confident in his team's method and validation process.
He and his colleagues have developed a software program to simulate the evolution of DNA over millions of years - statistically accounting for substitutions, insertions, deletions, and other polymorphisms that arise over time -- and test this program on a hypothetical ancestral DNA sequence, artificially evolving the DNA to create simulated "modern" DNA sequences for multiple species.
Then they use their computational reconstruction procedure, which is based on multiple alignments of many species, to work backwards and recreate the hypothetical ancestral sequence. They can then compare the two versions of the hypothetical ancestral genome to determine the accuracy of the method.
After applying the reconstruction process to real genomic sequences, the team validates its predicted ancestral genome by simulating the evolution for organisms that are not included in the group from which the ancestral genome was derived. They can then compare the simulated evolved genome to the real one to gauge the accuracy of the predicted ancestral genome.
Haussler said that scaling the project up to the entire human genome is a "captivating" prospect that would provide valuable insights into human evolution, but would ideally use the full genome sequences of around 20 mammals - far more than the National Human Genome Research Institute plans to sequence to completion.
"We have to keep sequencing genomes," he said.
Miller said that the project is using the genomes of 11 organisms so far - human, chimp, rat, mouse, dog, macaque, rabbit, cow, armadillo, elephant, and tenrec. Even though most of these have only been sequenced to very low coverage, Miller said that the team has already used this data to reconstruct the entire ancestral genome "a few times," although the results are still "preliminary."
Estimates for the length of the project vary, with some involved saying it could take as long as two years for a completely assembled ancestral genome to reach the public domain. Other sources involved in the project said an initial draft assembly could be available within the next six months.