ROCKVILLE, Md.-- Celera Genomics said on June 1 that it had begun to deliver partial assemblies of the human genome to its database subscribers and that it expected to announce the completed assembly this month. An assembled genome, the company explained, “is one in which the location and exact order of letters of genetic code along the chromosomes is known.”
Gene Myers, who gained notoriety in the pages of The New Yorker last week as a perpetually chilled, Nerf gun-shooting, emerald earring-wearing computer scientist, is the brain behind Celera’s assembly project. But assembling the human genome sequence, a task Myers and his team are expected to complete any day now, is just the first stage in his career at Celera, where he is director of informatics research.
After assembly, computational biologists will have their work cut out for them, Myers told BioInform. “I don’t think that our tools and capabilities in assisting biologists to investigate whole genomes at systemic levels are up to par. The right tools and right infrastructures have not been built.”
While existing tools are good, lack of integration is the weak link, Myers asserted. “What you are striving for is to allow a biologically focused investigator to sit down at a machine and generate conjectures for experiments,” he said. “I don’t think anyone’s gotten near the right formula for that.”
In a lecture he gave during the University of Wisconsin’s fourth Frontiers of Genomics conference last month, Myers reviewed the efforts that led to the successful assembly of the Drosophila genome. In an interview afterward, he alluded to comments made by Walter Gilbert of Harvard University who said that the amount of genomic information is growing by a factor of 10 every five years, growth that will stress the “limits of efficiency.”
“There’s going to be a bottleneck in terms of processing this data in an effective and timely way,” Myers stressed. “The kind of science you do is a lot different if you have to wait five days for the results of query versus if you have to wait five minutes.”
Myers said his job assembling Celera’s human genome sequence has been a computational problem approximately 30 times bigger than the one he solved for the Drosophila genome. The task required reengineering the algorithms that were used to assemble Drosophila. “We’re not changing the logic those algorithms are performing, but we are changing the way the computations are structured so they’ll fit in available memory machines,” he explained.
Memory for the process grows linearly, he said, noting that Drosophila consumed 20 gigabytes of disk space. Scaling up, with no modifications would mean the human genome will take 600 gigabytes. “We’re exploiting and using more parallelism,” he explained, adding that the assembly effort is equivalent to “900 CPU days.” Myers said Celera’s original plan called for 10 processors, which would have solved the problem in 90 days, but more processors have been and were still being added last month.
Myers said the bottom-up assembly process “starts from individual pieces, nucleates them into small contigs, decides if they are correctly assembled, builds them into scaffolds.” He noted that assembly is not merely an industrial operation but a scientific endeavor. “We’re not making toasters here,” he quipped.
Myers and colleagues developed the computer code for the project in 14 months. “If you consider the other codes being used out there, those are codes that have five or ten years of history,” he said.
As for what he will do when the assembly is complete: Myers said he is in no rush to get back to the Arizona desert, where he worked at the University of Arizona before taking leave to join Celera. “I’m working on building a first class research department at Celera,” he said. The job, he acknowledged, is all-consuming. How does he handle the pressure? “I have three cats. My cats keep me together,” Myers said.