NEW YORK (GenomeWeb) – A new release of genetic data from half a million individuals by the UK Biobank this month has been accompanied by acute interest from the genomics community, which views the resource as a "game changer" for anyone working in human genetics.
According to the initiative, which has collected data on a total of 500,000 individuals in the UK, more than 300 researchers from 139 different institutes around the world have already requested access to the full dataset since it became available in mid-July. Moreover, about 20 papers related to genetics have already appeared based on an earlier, partial release of the data, which organizers believe is just the start.
The UK Biobank had made a more limited amount of the data related to about 150,000 subjects available to thousands of researchers since 2013. Encrypted, de-identified data on all 500,000 individuals from the initiative is now being offered to approved researchers in two tranches. Phenotypic data on participants is being made available directly via the biobank, while the genetic data, consisting of samples genotyped using custom Affymetrix arrays, has been released through the European Genome-phenome Archive (EGA), a joint resource managed by the European Bioinformatics Institute in Hinxton, UK, and the Center for Genomic Regulation in Barcelona.
"Every major geneticist I know has downloaded this, which is impressive," said Ewan Birney, director of the EBI. He called the UK Biobank dataset the "most extensive cohort" ever made available to researchers, given both its size and the extent of the phenotypic data collected.
Between 2006 and 2010, the UK Biobank collected blood, urine, and saliva samples from 500,000 volunteers, who also provided a variety of personal information for the effort, and agreed to allow the biobank access to their National Health Service electronic health records. The phenotypic data collected concerned participants' lifestyle, medical history, and sociodemographic indicators. The participants also underwent cognitive function and hearing tests and some had their physical activity monitored.
In addition, many of the subjects have been imaged. Andrew Trehearne, a spokesperson for the UK Biobank, which is based outside Manchester, said that the effort is currently imaging 100,000 participants' hearts, brains, and abdomens via MRI. Some are also undergoing retinal scans. He said that imaging data on 10,000 participants is already available to researchers. "Ultimately this will be another huge tranche of data for health research," Trehearne said.
Birney said that his personal research group at EBI is interested in organ systems, particularly the heart and eyes, and will use the genetic data, paired with the physiological measurements obtained from imaging, to learn more about human development and disease association.
"This is a cohort that has been consistently phenotyped," said Birney. "That has produced a really good baseline of many different measurements on this half million people," he said. "On top of that, there is the linkage of the UK Biobank to the NHS medical records," Birney added. "It really is the most extensive cohort with these features, baseline phenotype, and then the linkage to ongoing diseases worldwide."
That sentiment – that the UK Biobank data release is unique – is shared by some who are using it, enticed both by the breadth of the data available, as well as the ease of access to the resource. Also, given its size, they see the opportunity to carry out scientific projects quickly, rather than sourcing different datasets from various cohorts worldwide that were often collected based upon a specific phenotype, such as a shared disease.
"Scientists have spent the last 10 years making steady progress to increase our understanding of the genetic variation that influences all human traits – including birth weight, obesity, smoking addiction, male baldness, cancer, diabetes, Alzheimer’s disease, and many others," said Timothy Frayling, a professor of human genetics at the University of Exeter Medical School in the UK who is using the new resource.
"We’ve done this by pooling genetic data from hundreds of studies around the world, each of a few hundred to a few thousand individuals, to generate total sample sizes of a few hundred thousand," he said. "Now that we have the UK Biobank data, we can perform analyses that previously took a few years in a few days."
Frayling's approved projects involve studying the genetic factors that influence type 2 diabetes and obesity. Although researchers around the globe are accessing the data, with projects approved from Australia, New Zealand, and Malaysia, for instance, British investigators make up the bulk of researchers approved to use the UK Biobank resource, with 14 projects approved for the University of Exeter alone. However, a spokesperson for EBI said that more than half of the applications to use the data are now coming from outside the UK with "huge interest" from North America and elsewhere in Europe.
Frayling said that British scientists are hoping the open model demonstrated by the UK Biobank will be adopted by biorepositories in other countries, rather than a more "protectionist" one.
"These analyses can be performed by any legitimate scientist from anywhere in the world – because the data is completely open access, with the only requirement being to show that you have the skills and expertise to analyse and interpret the data," said Frayling. He called that kind of access "unprecedented" and said that the UK Biobank's approach is the "ultimate in scientific democracy" that will result in faster delivery of benefits to patients and healthcare, versus what has been achievable to date with "more limited models of data sharing."
UK Biobank's Trehearne noted that as a public resource funded by the UK Medical Research Council and the Wellcome Trust, it is the UK Biobank's prerogative to make its data widely available. "The goal is to provide a resource that is as useful as it can be," he said.
Erik Ingelsson, a professor of medicine at Stanford University who has been using the resource, held a similar view to Frayling's, calling the dataset a "marvelous piece of work" that is "extremely rich" in terms of phenotypic data.
"It's a totally revolutionary way of doing it," Ingelsson said of the data release. "There are other good resources, but they are rather locked in, you need to either be on the inside or part of some select group that has collected the data."
"I think the UK Biobank has done the correct thing," he said. "It's very altruistic. They could have just said they were doing it all for themselves."
Ingelsson, who recently moved to Stanford from Uppsala University in Sweden, said he had access to the limited release of the UK Biobank data since 2014, and is now using the full dataset for an array of projects related to cardiovascular disease, obesity, and diabetes. He noted that his lab uses the genetic data in diverse ways, from traditional epidemiology projects to genome-wide association studies, to other projects looking at causality using genetics.
"It's a big game changer for anyone working with human genetics," said Ingelsson. "Many have used the interim release, but they have been anticipating this. There has been a lot of excitement." Ingelsson noted that as part of his arrangement with the UK Biobank, he and other researchers using the dataset must return all data generated in their projects and all code, algorithms, and other tools, to the UK Biobank.
He also said he expected many more papers to come out based on the resource over time, adding that by making it available as an open resource, the scientific community would reap the maximum benefit.
"The use of the resource is more efficient this way because you have more minds thinking about important scientific questions," said Ingelsson. "I think it's a great design, and I think it will be very important for the human genetics field."
The EGA, which is handling the release of the genetics data related to the project, is prepared for a high level of interest. Thomas Keane, team leader for the EGA and archive infrastructure at EBI, said that the resource is already built to handle high-demand datasets such as those from the UK Biobank.
"For this data release, the technical challenge for the EGA was to have sufficient bandwidth to deliver [if necessary] up to 2.4 petabytes to researchers around the world in a short period of time," Keane said. He noted that since the EGA is federated between EBI, the CRG in Barcelona, and a third data site at Hemel Hempstead in the UK, the biobank can coordinate the most appropriate location for each research group.
According to EBI, half a petabyte of data was transferred in the first two weeks since the resource went online, and the EGA anticipates up to two petabytes of transfer to the research community over the next two months.
Birney noted that the UK Biobank ensured that the data was made available to all approved researchers at the same time, prior to the official release this month. For about a month, he noted, the data was available in encrypted form. Then, in mid-July, all the participants were provided simultaneously with an encryption key. The race to analyze the data and publish was on.
"For a long time, a lot of people have been doing disease-by-disease studies," commented Birney. "The UK Biobank really changes that equation because there are enough cases of common diseases in that half million cohort for one to draw good inferences," he said. "There will be lots of discoveries coming out of the UK Biobank."