
SALT LAKE CITY (GenomeWeb) – An international team is amassing and analyzing variant data in a multitude of human exomes, attendees at the American College of Medical Genetics and Genomics annual meeting heard yesterday, producing a resource that's already proving useful in the interpretation of causal SNPs and penetrance patterns.
Speaking at the presidential plenary session, Daniel MacArthur, a researcher affiliated with Massachusetts General Hospital, Harvard Medical School, and the Broad Institute, offered a glimpse at some of the functional information being gleaned from tens of thousands of human exomes.
In particular, he pointed to a previously proposed liver disease risk gene that may have been exonerated with the accumulated exome data, along with newly available penetrance estimates for variants in a gene implicated in prion disease that were produced with the help of variants identified in the massive protein-coding sequence set.
The work is being done by the Exome Aggregation Consortium, or ExAC, a group that hopes to use patterns in as many human exomes as possible to not only decode individual disease cases but also to tease out authentic disease culprits and define the risk they pose to carriers.
In the past, it has been tricky to amass and standardize exomes, which are often generated through independent efforts using inconsistent processes and analytical protocols, noted MacArthur, who leads the consortium.
With that in mind, he and his colleagues in the consortium have been drawing from multiple case-control studies of common, complex human diseases over the last few years to bring together information on 92,000 exomes so far.
The consortium has processed the complete exome set with a standard variant calling pipeline, using a system developed to scale up the number of exomes that can be considered together when searching for genetic variants.
After plucking out exome sequences from individuals with severe childhood disorders, other potentially Mendelian conditions, and their relatives, the researchers were left with a core set of more than 60,000 exomes, which comprise the ExAC exome reference set.
The sequenced individuals in the reference exome group are not necessarily healthy, MacArthur said, since many participants were recruited for common disease studies. Even so, the Mendelian disease frequency is expected to be comparable to that in the general population.
The dataset is a leap forward in both its size and geographical diversity compared to the exome-based variant information that was previously available from efforts such as the 1000 Genomes Project or the National Heart, Lung, and Blood Institute's Exome Sequencing Project, MacArthur explained.
In a principal component analysis of the ExAC data, for example, the team saw genetic clusters that coincided with European, African, Latino, South Asian, East Asian, and other populations. It is still missing large-scale exome sequence data for individuals from Middle Eastern populations, but hopes to obtain those in the future.
Those involved in the effort expect the large-scale exome sequence data to prompt more accurate clinical interpretations of risk variants and their impacts.
To illustrate such utility, MacArthur pointed to a missense mutation in the CIRH1A gene that was previously implicated in liver disease risk in individuals from an Ojibway Cree population in Quebec through the Native American Indian Childhood Cirrhosis project.
As it turned out, the same SNP turned up in three ExAC data from individuals from a Latino population for whom phenotypic data was available. Though two of the individuals were diabetic, none suffered from unusual liver traits or symptoms.
And because no other mutations in CIRH1A have been linked to liver problems during more than a decade of follow-up studies and functional analysis, MacArthur noted that the risk SNP proposed initially may be a have been a false-positive result, with the original SNP potentially tagging another yet unidentified causal variant.
The ExAC collection can provide insight into the penetrance of variants in well-documented disease genes, too, he noted. For instance, the team saw far more individuals than anticipated in the ExAC exome — and in data from collaborators at 23andMe — who carried mutations in PRNP, a gene linked to prion disease.
Nevertheless, the types of mutations found in individuals with documented prion disease often differed from those in unaffected controls, highlighting fully penetrant and harmless variants.
Falling between them were mutations with intermediate penetrance that could be estimated based on prion disease frequency data, MacArthur explained.
Finally, he noted that the ExAC collection is expected to offer insight into parts of the protein-coding genome that are relatively impervious to change due to selective constraints.
ExAC announced a public data release last fall at the American Society of Human Genetics meeting in San Diego. The ExAC browser is designed to look not only at a variant and its expected impact on a gene, but also the frequency of the variant in different populations and the quality of the variant call.