NEW YORK (GenomeWeb) – Regeneron Pharmaceuticals said today that it has formed a pre-competitive consortium with AbbVie, Alnylam Pharmaceuticals, AstraZeneca, Biogen, and Pfizer to sequence the exomes of all 500,000 participants in the UK Biobank.
"All of us involved have a shared belief in the power of genetics to facilitate and guide drug discovery and development," said Aris Baras, vice president and head of the Regeneron Genetics Center (RGC), a wholly-owned subsidiary of Regeneron.
Under the agreement, AbbVie, Alnylam, AstraZeneca, Biogen, and Pfizer will each contribute $10 million to the project. Regeneron will provide an undisclosed amount of its own funding, and the RGC will conduct the sequencing for the project. Additional companies are currently considering joining the consortium.
The goal is to sequence the exomes of all 500,000 biobank participants by the end of 2019, and to make all sequence data available to other researchers, in accordance with UK Biobank's access policies, by the end of 2020. Consortium members will have exclusive access to the data for a limited period of time — between six and 12 months — and plan to publish their research findings in peer-reviewed journals or on open-source sites.
The new consortium builds on an earlier initiative between Regeneron, the UK Biobank, and GlaxoSmithKline, announced last March, to sequence all 500,000 UK Biobank participants. At the time, Regeneron and GSK committed to funding the first 50,000 exomes, which they planned to complete by the end of 2017, followed by a nine-month exclusivity period. The original expectation was that sequencing the entire biobank would take three to five years.
Baras said this timeline has now been significantly shortened, and sequencing the entire set will be completed by the end of next year. "That is a pretty major upgrade," he said, and will ensure that other researchers will have access to the data much earlier than originally planned.
GSK is not involved in the new initiative, though. "We have enjoyed working with GlaxoSmithKline on the first phase and welcome their participation in this consortium, but they are not involved, at least at the moment, in this new project," Baras said.
The 50,000 exomes sequenced under the initial project will be completed later this month, he added, and the data will become available to other researchers later this year.
Last year, the UK Biobank had estimated that sequencing the exomes of all 500,000 participants would cost on the order of $150 million, or about $300 per exome. Baras said that sufficient funding is available to complete the entire project, pointing out that the cost of exome sequencing differs between centers, and that the RGC has "tremendous efficiencies" and "can provide very high quality at a good price."
The RGC currently uses Illumina NovaSeq instruments, as well as some legacy HiSeq sequencers, for its exome sequencing pipeline and said late last year that it is in the process of increasing its capacity to 400,000 to 500,000 exomes per year.
For now, the plan is to continue sequencing exomes, not genomes, of the remaining biobank participants, although others might plan to sequence the full genomes of the UK Biobank cohort further down the line.
All samples have already been genotyped, and additional variants have been imputed, so genome-wide data is already available. "The big missing piece was full sequences of coding regions," Baras said. "Above and beyond that, at least in our opinion and many others' opinions, the incremental value of whole-genome sequencing is really not worth the time and cost. At some future stage, it will make sense."
The exome data will be released in batches several times per year, he said, first to consortium members and later to the UK Biobank and researchers who have access to it.
The biobank — which is funded by the UK's Medical Research Council, the Wellcome Trust the Department of Health, the Welsh and Scottish governments, the British Heart Foundation, Cancer Research UK, and Diabetes UK — has health-related data available for all its participants, including de-identified medical records and imaging data, as well as blood and other biological samples. That, the researchers hope, will allow them to link genetic variation with biology and disease. "We think this is a tremendous resource," Baras said.