By Monica Heger
This story has been updated from a previous version to clarify comments made by Javier Gracia-Aznarez.
Using a targeted resequencing approach, researchers from the Spanish National Cancer Research Center in Madrid identified eight variants that could play a role in hereditary breast cancer not caused by the BRCA1 or BRCA2 mutations.
Their results, detailed this month in PLoS ONE suggest that the approach — sequencing, at a high depth, chromosomal regions identified by linkage tests — may be a good method for identifying genes in hereditary disease. Indeed, other groups are using similar approaches, including a group from Scripps Research Institute that used results from a genome-wide association study to do a targeted sequencing study on obesity and anorexia patients.
The Spanish team analyzed samples from 20 breast cancer patients from nine different families without the BRCA1 or BRCA2 mutations, and four healthy unrelated individuals without a family history of breast cancer. They first used an exon-capture technique to target two regions on two different chromosomes, which contain 128 known genes and have been previously implicated as candidates for containing breast cancer susceptibility genes.
They then barcoded and pooled samples, sequencing four samples per lane, to around 33-fold coverage on the Illumina Genome Analyzer using a single-end sequencing strategy and 36-base reads. They generated over 100 million reads from the affected individuals, about 40 million of which could be aligned to the candidate regions.
Next, the team called variants, selecting only those that were located in the regions of interest, resulting in an average of 71 SNPs per individual. Filtering out homozygous variants and comparing these to controls and members of the family eliminated around 80 percent of the SNPs. Finally, they narrowed the list down to nine previously undescribed SNPs, where every member of the family contained the SNP, and the SNP affected a protein-coding gene. Eight of those nine SNPs were confirmed by Sanger sequencing.
Once all the filters were applied, not every family had a SNP, and no SNP was shared among families. "It would have been more convenient to find one or several that were shared," said Javier Gracia-Aznarez, an author of the paper and postdoc in the human genetics group at the Spanish National Cancer Research Center.
Gracia-Aznarez said one possibility for not finding shared SNPs could be that they needed to sequence more families.
Another possible explanation is that the variants fall on low-penetrance breast cancer susceptibility genes, said Gracia-Aznarez. "This could be the reason why non-BRCA1 and 2 families don't have a clear candidate [gene] yet, despite the fact that they've been studied for a long time," he added.
The team also looked for indel variants, but after applying their filters did not find any potential candidates. "This could be due to the fact that indel discovery on single-end data is not as accurate as with the new paired-end technology," the authors wrote. "Similarly, we cannot discard the possibility of missing the existence of large rearrangements due to the limitations of single-end data."
Other groups are using similar approaches to identify potential gene targets in complex hereditary disease. Ashley Scott, who presented at the XGen Congress meeting last month, is working with a group at Scripps Research Institute that is doing targeted sequencing on an anorexia cohort.
She said the breast cancer study illustrates a lot of the issues that targeted resequencing studies are facing. "This type of analysis to identify SNPs isn't done routinely quite yet, so people are patching together different approaches," she said. Groups are using different techniques for calling variants and applying filters to narrow down the list of SNPs, but there isn't a standardized method, nor have the different methods been directly compared. "Until the field reaches a consensus, we’re going to see all these different types of filtering," she said. In particular, she thought it was interesting that the team chose to discard the variants in the intronic regions, which could contain potentially interesting information.
She added that the breast cancer study also shows that these sequencing studies are not endpoints in figuring out the relevant mutations. "Instead, these studies raise a lot more questions and a lot more stuff to look at," she said. For instance, while the researchers were able to narrow their SNP list down to eight candidates, doing functional studies on those eight candidates will be time consuming, potentially taking years. And, it will be difficult to prioritize which SNP to study first, she added.
In the anorexia study, Scott and the Scripps team used results from genome-wide association studies to target and sequence specific areas of the genome from 400 anorexia cases, and 100 controls using RainDance's capture method and Life Technologies' SOLiD machine.
Five hundred people is too large of a cohort for whole-genome sequencing, and the team opted not to do exome sequencing because it also wanted to study promoter regions, Scott said. So they are targeting 150 genes identified by previous genome-wide association studies.
Scott said they don't have any results yet, but in another study at Scripps, researchers used a targeted sequencing approach to study an obesity cohort, and found the majority of their variants within a kilobase of two of the candidate genes that had been implicated by genome-wide association studies.
While the cost of whole-genome sequencing continues to drop, Scott and Gracia-Aznarez both said it is still too expensive for sequencing large cohorts. "This will continue to be a valid approach for studying other types of disease," said Gracia-Aznarez.