By Monica Heger
EdgeBio will sequence the reference genomes and develop the bioinformatics by which to judge the contestants of the Archon X Genomics Prize, the company said last week at a press conference announcing the revised guidelines of the competition.
Organizers of the Archon Genomics X Prize — which will award $10 million in early 2013 to the first team to sequence the genomes of 100 centenarians within 30 days for $1,000 or less per genome — have changed the timeline, format, and goals of the competition to reflect a new goal of producing a 'medical-grade' reference genome that will have an error rate of one per every one million bases (CSN 10/26/2011).
While a medical-grade genome could be achieved using Sanger sequencing, Craig Venter, a member of the scientific advisory board for the competition, told Clinical Sequencing News at last week's event that "the goal is to get highly accurate, haplotype-phased genomes that can be done very quickly."
Additionally, a medical-grade reference genome could help bring next-gen sequencing into the clinic, said EdgeBio CEO Dean Gaalaas, because it would give the regulatory agencies a dataset by which to evaluate laboratories developing sequencing-based tests.
A medical-grade reference genome will enable regulators to "gauge a lab's proficiency at running these [sequencing] instruments," he said.
A Medical-Grade Genome
In March, AGXP organizers said they had expanded the goal of the competition while keeping the rules and prize the same, because its original intent — to provide financial incentive to drive the advancement of sequencing technology — was not as relevant as it was when the project first kicked off in 2006 (IS 3/1/2011).
At the time, some researchers said that the criteria for winning the competition were not stringent enough to result in a true clinical-grade genome. For instance, accuracy requirements were initially one error in 100,000 bases, and the cost had to be under $10,000 per genome — numbers that would not be suitable for clinical purposes, Complete Genomics' chief scientific officer Rade Drmanac told In Sequence at the time.
However, at a press conference in New York last week, and in a subsequent commentary published in Nature, the organizers presented the final criteria of the competition, which specify that the genomes must be sequenced for under $1,000 each with an error rate of one in one million bases.
The competition's goal is now to help facilitate the transition of next-generation sequencing into the clinic.
"The outcome of such a large-scale approach will be close to the reality of 'medical-grade' genomes that could be used as models for clinical applications," Larry Kedes and Grant Campany, AGXP organizers, wrote in the Nature commentary.
The bar has been set high enough that currently, "no commercial [sequencing] technology in its present form could win the X Prize," Venter said at last week's event.
Using today's technology, sequencing a genome with two different platforms will give two different answers, he said, which is not desirable for medical applications.
Venter said that the X Prize competition should drive better understanding of how a person's genome relates to his or her health since the longevity of the subjects indicates some genetic underpinnings related to wellness.
The centenarians that are being sequenced as part of the competition probably have genes that increase disease risk, said Venter, but "somehow there are other genes in the genome that overcame that." Piecing together which genes confer disease risk and which are protective will take thousands of complete human genomes plus corresponding phenotypic information.
While sequencing costs have dropped dramatically in recent years, it won't be until thousands of genomes can be sequenced accurately and affordably that scientists will begin to make real headway in piecing together the functionality of many genes and variants.
"My genome is not much more interpretable today than it was four or five years ago," Venter told CSN. Venter and colleagues at the J. Craig Venter Institute sequenced his genome using Sanger technology in 2007 (IS 9/7/2007).
Judging the Competition
The competition will begin Jan. 3, 2013, and last until Feb. 2, 2013, and the $10 million prize will be awarded to a winner who can sequence all 100 centenarians in that time to an accuracy of one error per one million bases, with 98 percent completeness, a complete haplotype, and at a cost of $1,000 or less per genome. Additionally, the winner must identify insertions, deletions, and rearrangements.
If no team meets all the requirements for the $10 million prize, three separate prizes will be awarded for accuracy, completeness, and haplotyping.
EdgeBio will be responsible for creating the reference genomes and bioinformatic tools that will be used to judge the entries, a task it expects to complete by next December, said Gaalaas.
The validation protocol will occur in three phases, said Gaalaas. The first phase will involve sequencing two publicly available well-characterized genomes, such as the YRI and CEU trios that have been sequenced by both Complete Genomics and Illumina.
As per the AGXP requirements, EdgeBio will construct fosmid sequencing libraries for the two genomes and then the libraries will be sequenced using two different next-generation sequencing technologies to determine whether the sequencing of the fosmid libraries should be done with multiple technologies.
Depending on available resources, the EdgeBio team may also sequence the fosmid libraries on the Ion Torrent PGM 318 chip in order to "complement the use of Illumina and SOLiD platforms by helping to refine sequence identity in difficult-to-sequence regions of the genome," the company wrote in its response to the AGXP committee's RFP.
Additionally, whole-genome sequencing on the Illumina HiSeq 2000, Life Technologies' 5500xl, and Complete Genomics will be done on one of the genomes to determine completeness and bias differences between the technologies.
Validation will be done using Sanger sequencing, and the samples will also be genotyped.
The second phase will involve the sequencing and validation for the contest genomes. Cell lines from 110 centenarian genomes will be provided to EdgeBio (although contestants will only have to sequence 100). For 25 of those genomes, EdgeBio will create fosmid sequencing libraries using two different sequencing technologies. Gaalaas said the team would most likely use Illumina and Complete Genomics but that additional sequencing platforms may also be used. Microarray analysis will be performed on all 110 samples.
Finally, the team is responsible for developing bioinformatics tools to judge the competition, said Gaalaas. The team will have to develop new tools, which will "provide a uniform framework for managing and assessing the data," he said. "Some of the things the [organizers] are asking for don't exist yet," such as methods for doing haplotype phasing.
Gaalaas said that there are still a lot of open questions about the technology that will ultimately be used for much of the bioinformatics, but that the team is forming partnerships with experts in different areas, and is also keeping an open mind, considering technologies such as the Pacific Biosciences RS and OpGen for haplotyping.
Have topics that you'd like to see covered by Clinical Sequencing News? Contact the editor at mheger [at] genomeweb [.] com.