This article has been updated from a previous version to clarify the clinical phenotypes to be studied in WTCCC2.
A year after the Wellcome Trust Case Control Consortium published its findings from a large-scale genome-wide association study that identified genomic regions associated with seven common diseases, consortium scientists have launched resequencing studies to follow up on the results, hoping to identify causative genetic variants.
After resequencing genomic regions implicated in different diseases in 32 HapMap samples, scientists at the Wellcome Trust Sanger Institute are now resequencing genomic regions in disease samples, each study involving 40 cases and 40 controls, using a combination of PCR-based amplification and Solexa sequencing.
The Wellcome Trust Case Control Consortium, which involved more than 50 research groups across the UK, published its results a year ago in Nature. The scientists genotyped a total of 17,000 samples using Affymetrix 500K microarrays, which queried 500,000 SNPs. For each of the seven diseases, they assayed 2,000 samples, in addition to 3,000 shared controls. The diseases studied by the consortium were type 1 and type 2 diabetes, Crohn’s disease, coronary heart disease, hypertension, bipolar disorder, and rheumatoid arthritis.
Each of the associations found in this genome-wide study “point to a particular region in the genome and say there is some interesting variant in this region,” according to Peter Donnelly, director of the Wellcome Trust Center for Human Genetics in Oxford and chair of the WTCCC. These regions differ in length between several dozen kilobases to more than a megabase of DNA, he said.
But in order to identify rare variants within these regions, and to find variants that contribute directly to the disease, more work needs to be done, including “substantial resequencing” of the regions, he said. The Wellcome Trust has granted approximately £7 million ($13.8 million) in total funding for WTCCC follow-up studies, of which £750,000 are currently allocated for resequencing studies.
As a prelude to sequencing disease samples, scientists at the Wellcome Trust Sanger Institute started out by resequencing a total of 2.7 megabases of DNA from approximately 20 regions from different diseases in 32 HapMap CEPH samples, which are of Northern European origin.
“The purpose of that was to get a better understanding of the allelic diversity in those regions, because obviously, our knowledge of the sequence variation is rather thin at the moment,” Aarno Palotie, a senior investigator and head of medical sequencing at the Sanger Institute, told In Sequence this week.
Starting out with HapMap samples rather than disease samples was “a very wise thing to do,” he said, because these samples were readily available in large enough quantities, and the study, which has not been published yet, could quickly be completed.
For this study, which used Illumina’s Genome Analyzer for sequencing, the scientists amplified the target regions by PCR, using a “very efficient PCR pipeline” they set up, according to Palotie. “At the moment, that is still the cheapest way of producing targets for sequencing,” even with new capture, or pull-down, methods coming online, he said.
“One doesn’t need a crystal ball to predict there will be a flow of sequencing proposals submitted, and definitely, there will be follow-up by sequencing.”
“There is some room for improvement for the companies who have developed these pull-down techniques to be competitive with their pricing structure,” said Palotie.
So far, he and his colleagues have been using NimbleGen’s array-based capture technology because it was the first commercially available one, but “our plan is to test out several [other] ones.” He said other companies working on genome-selection technologies include Agilent Technologies, FlexGen, and Febit.
What is important for any of these is their cost, flexibility, and effectiveness, he said, as well as their ability to be implemented in “a true pipeline” for studying large numbers of samples.
Next in line are resequencing studies for “as many as possible” of the seven WTCCC diseases, several of which are currently ongoing.
Although the current studies use PCR to amplify the regions, the researchers hope to switch to one of the new pull-down methods in upcoming studies because the amount of DNA from the disease samples is limited.
According to Palotie, “one big drawback with PCR is that it is really DNA-hungry,” consuming as much as 30 micrograms of DNA for large regions. “So what we are looking forward to with some of the capture methods is that [the companies] develop protocols which save DNA,” he said.
Each of the studies involves using Illumina’s Genome Analyzer to sequence about 2 megabases of DNA from several regions in 40 disease samples and 40 controls with the aim of discovering “as much variation as possible” in these regions, according to Donnelly.
Selecting 40 cases from the 2,000 available samples is a challenge, Palotie said. “For each region, you select for a risk haplotype, and then for the protective haplotype, so it becomes quite a puzzle.”
Donnelly said the researchers are exploring different strategies for choosing which individuals to resequence, and have recently performed simulation studies to become acquainted with the task. “We are trying to use the information from the association study to pick individuals for resequencing at a particular locus, to increase our chances of finding variants that might be causative for the disease,” he said.
As follow-up studies for findings from the WTCCC are getting underway, a second large-scale genotyping project is already taking shape.
In April, the Wellcome Trust said it is funding WTCCC2 and 12 independent consortia, involving a total of 60 research institutions within and outside the UK, with £30 million. A follow-up to the first WTCCC study, these groups will analyze 120,000 samples from 25 diseases and other traits.
According to Leena Peltonen, head of human genetics at the Sanger Institute and deputy chair of WTCCC2, the clinical phenotypes to be studied include multiple sclerosis, learning difficulties, and other disorders.
Most of the funding will go towards arrays and reagents for genome-wide association studies, she told In Sequence last week, and “not a huge amount of sequencing” will be involved in the initial study.
Last week, Illumina said that its Infinium HD BeadChips will be used to analyze 90,000 of the samples, and its Infinium HD Human1M-Duo BeadChips to study 6,000 controls. However, according to Peltonen, the consortium will also use Affymetrix arrays, and the use of genotyping platforms will be “pretty much evenly split between Illumina and Affy.”
As with the first WTCCC, Peltonen predicts that resequencing studies will follow the genotyping scans, aiming to expose rare variants as well as certain structural variations.
“One doesn’t need a crystal ball to predict there will be a flow of sequencing proposals submitted, and definitely, there will be follow-up by sequencing,” she said.