Five months after the launch of the 1000 Genomes Project, 454 Life Sciences, Applied Biosystems, and Illumina said they have joined the effort as direct participants.
All three second-generation sequencing platforms sold by these vendors were already being used by genome centers in pilot studies of the project, which was announced in January.
With the commitments from the three companies, as well as additional capacity made available by some of the participating centers, the project can now increase the sequence coverage for two of its three pilot projects.
Terms of their participation call for each company to contribute 75 gigabases of sequence data free of charge to two of the ongoing pilots. They are expected to provide more sequence data to the full project that will follow.
ABI, meantime, is also providing an undisclosed amount of funding for SOLiD instruments and reagents to consortium member Baylor College of Medicine’s Human Genome Sequencing Center as part of a previously announced collaboration, according to Francisco De La Vega, vice president for SOLiD system applications and bioinformatics at ABI. Baylor is contributing an additional 200 gigabases of SOLiD data to the pilot projects.
The companies, which are adding members to various project committees and groups, will also help organize the project and the data analysis (see Paired Ends in this issue).
This is not the first time that instrument vendors will directly participate in a large-scale international genotyping project. Illumina and Perlegen took part in the International HapMap project, which ended in 2005, though they received funding from the National Human Genome Research Institute in addition to providing some resources of their own.
The 1000 Genomes Project builds on the International HapMap Project and aims to catalog genetic variation in humans that is present at low frequencies: at least in 1 percent of individuals across their genomes and at least in 0.5 percent of people within genes (see In Sequence 1/22/2008).
During a one-year pilot phase that seeks to find the best sequencing strategies, methods, and protocols, and to test different sequencing platforms, the project is conducting three pilot projects.
Under the first pilot, the group will sequence, at low-coverage depth, 180 samples: 60 unrelated samples each from HapMap European ancestry (CEU), African Yoruban (YRI), and a combination of Han Chinese (CHB)/Japanese (JPT) populations.
The second pilot will sequence, with high coverage, six samples: two trios of parents and child from the YRI and CEU populations.
In the third pilot, approximately 1,000 gene regions and conserved elements will be sequenced in about 1,000 individuals at high coverage.
Initial data has already been generated by five participating sequencing centers (see In Sequence 5/13/2008): the Wellcome Trust Sanger Institute, the Shenzen branch of the Beijing Genomics Institute, Baylor College of Medicine’s Human Genome Sequencing Center, the Broad Institute of MIT and Harvard, and the Washington University Genome Center.
The three companies will help to increase the coverage for the first two pilot projects.
For the first project, the companies will each generate 2X coverage on 10 CEU samples, or 60 gigabases of sequence data. This will allow the project to increase coverage in that population from 2X to 4X.
The original plan to do 2X coverage “came out of an argument that this would be an efficient way to capture shared variation,” according to Durbin. “There has been further discussion whether, possibly, the 4X coverage might be a preferable alternative, which is why we are also evaluating that option,” he said.
“A project like that will allow us to hear more closely the voice of the customers.”
For the second pilot, each company will contribute 15 gigabases, or 5X coverage, for one of the two trio children. As a result, instead of generating 20X coverage for each of the six samples, the two children will now be covered at 40X depth.
“These two children will be real test cases, with all the data public and a lot of focused analysis,” Richard Durbin, co-chair of the project’s steering committee and a principal investigator at the Sanger Institute, told In Sequence last week. “[They] will give us the most detailed picture [to date] of whole-genome variation on an individual basis.”
Both ABI and Illumina have already sequenced internally the adult male Yoruban sample that is part of one of the trios (see In Sequence 2/26/2008), and have submitted their data to public databases, but that data will not become part of the project.
“It will be very nice validation data,” said Lisa Brooks, director of the genetic variation program at NHGRI, which funds the 1000 Genomes Project. “A major part of the pilot project is to produce these data and then see what the data quality are, how well structural variants can be detected, what depth of coverage is necessary in order to look at lots of samples, and so the more validation data we have in order to address those questions, the better.”
According to Durbin, the three companies are also expected to contribute additional data to the full project, proportional in size to their contribution to the pilot phase.
The design for the full project, which will involve sequencing the genomes of at least 1,000 individuals, and maybe as many as 1,500, has not been finalized. But it is already clear that it will require at least five times more sequence data than the pilot projects, according to Durbin.
That might be one reason why the project accepted the vendors’ offer to chip in some data.
Durbin said that several sequencing companies approached project organizers after the launch in January to ask whether they could directly participate.
The organizers decided that their participation would help both sides, and “the sort of price of entry was to produce data for the project,” said Brooks. In addition to contributing data, she said, “the companies also bring very good expertise, of course, on their own platforms. So when we are going into the nitty-gritty of something like transferring the data, or ‘What does this quality score mean on this platform?,’ it’s really great to have the people on the phone calls who developed those algorithms.”
Durbin said the organizers also talked to several other undisclosed sequencing vendors “who are sort of in the wings, but who were not ready to join and contribute at this stage.” He added that “we will be happy for other companies to join, in principle, on the same terms.”
Helicos Biosciences and Danaher Motion-Dover both currently offer commercial sequencing systems for sale, but according to Brooks, neither company has expressed an interest, as of now, in joining the project.
Reasons to Participate
By joining the consortium, Illumina, ABI, and 454 hope to showcase their technologies for human genome resequencing applications and to expose researchers inside and outside of the project to more data from their instruments. All data will be submitted to public databases.
“We were very interested in whether SOLiD could be used for whole-genome resequencing studies, and [this] would be a good demonstration project for that,” said ABI’s De La Vega.
“We also believe that having us involved in a project like that will allow use to hear more closely the voice of the customers, what are their needs, what are the problems they are facing, at least in this particular application,” he added.
In addition, ABI hopes that its contribution will spur more researchers to work with SOLiD data and to develop new analysis tools.
“I definitely think that there is going to be more of an incentive for developing tools that can use SOLiD data in its native format,” De La Vega said. For example, the Sanger Institute’s MAQ alignment software can now support SOLiD data, he said, and the Mosaik aligner/assembler from Gabor Marth’s group at Boston College “is also starting to support SOLiD.”
“We absolutely believe that our technology has very significant advantages over others’” both in terms of throughput and accuracy, he added. “I think that potential customers that are looking to make a decision will look at these data and it will be clearer to them what the benefits of the technology are.”
Roche’s subsidiary 454 believes that its own technology is the sine qua non for the project. “We believe that you cannot sequence a human genome without high-quality long reads,” said Michael Egholm, 454’s vice president of R&D.
The company, which recently sequenced and published the genome of Jim Watson (see In Sequence 4/22/2008), was “founded on a vision of doing routine human sequencing,” he said, and needs “to stay engaged with initiatives such as the 1000 Genomes Project.”
“We also think that it is important that a lot of 454 data gets submitted [to the project], so the three different datasets can be compared for quality,” Egholm said. “It’s likely that it’s a mixture of reads [from different platforms] that needs to be used.”
Besides participating in this project, Egholm said he and his colleagues are “still very bullish on our ability in the not-too-distant future to do de novo assemblies of mammalian genomes” using its 3-kilobase and 20-kilobase paired-end libraries.
According to David Bentley, Illumina’s chief scientist, the 1000 Genomes Project is “a true test of technologies,” and Illumina’s sequencing technology, which has been geared “very much towards human genome sequencing,” is a good match.
He and his colleagues “can benefit from seeing our technology used by those who really are experts looking at the results,” he said. Though Illumina has considerable internal expertise, “it’s always nice to have many more pairs of eyes looking at what’s coming out and trying to get more out of it.”
Previous academic collaborations, for example with the Sanger Institute analysis group on sequencing the X-chromosome and a HapMap sample, have been “very beneficial,” he said.
Besides testing the technologies, the project might also ready them for future large-scale human sequencing studies.
The 1000 Genomes Project “is probably going to be the last set [of studies] looking at un-phenotyped samples,” Brooks said. “Disease studies will be clearly the next step.”