NEW YORK (GenomeWeb News) – Researchers from the University of Washington and Agilent Technologies have published a proof-of-principle paper demonstrating the feasibility of exome sequencing for finding rare, disease-associated mutations.
The team used array-based sequence capture techniques and high-throughput sequencing to generate more than 300 million bases of sequence information representing a dozen human exomes. When they looked at the exomes — generated from eight HapMap individuals as well as four individuals with a condition called Freeman-Sheldon syndrome — they found a slew of genetic variants as well as mutations in the MYH3 gene that characterize Freeman-Sheldon syndrome. The research appeared in the advance online edition of Nature yesterday.
"We have … sought to develop second-generation methods for targeted sequencing of all protein-coding regions … to reduce costs while enriching for discovery of highly penetrant variants," senior author Jay Shendure, a researcher with the University of Washington's department of genome sciences, and his co-authors wrote.
The research was done as part of the Exome Project — an effort funded by the National Heart, Lung, and Blood Institute and the National Human Genome Research Institute to come up with cost-effective exome sequencing protocols that can eventually be applied to large, carefully phenotyped human populations.
The effort is based on the rationale that sequencing the one percent or so of the genome that codes for protein sequence will be much faster and cheaper than whole-genome sequencing, while still having the potential to find mutations involved in human disease.
"We have great hope that targeted sequencing, when applied to a larger number of individuals, will be used to discover the genetic underpinnings of common conditions such as high blood pressure and high cholesterol," NHLBI Director Elizabeth Nabel, who was not directly involved in the study, said in a statement. "The current findings provide the fundamental groundwork for pursuing this important goal."
Shendure and his team decided to test this approach using DNA samples obtained from eight individuals in the HapMap project and four individuals with Freeman-Sheldon syndrome, also known as distal arthrogryposis type 2A, a rare disorder that's inherited in an autosomal dominant fashion.
In order to sequence just the exome for each individual, the researchers captured the exons by hybridizing genomic DNA shotgun libraries to seven microarrays and sequencing the captured regions using unpaired Illumina Genome Analyzer II reads.
Earlier this spring, Shendure and his colleagues reported in Nature Methods that they had improved on existing multiplexed exon-capture techniques, establishing a protocol that could be used in exome sequencing.
The team generated roughly 6.4 gigabases of sequence data for each person, on average. Nearly half of the 76 base-pair reads mapped to targeted regions of the genome. After tossing out duplicated reads, the researchers found that they had achieved about 51 times coverage of each exome, on average, covering about 99.7 percent of the targeted sequences.
When they started analyzing the exomes, the researchers found 56,240 coding SNPs that were present in at least one of the individuals tested — 13,347 of which had not been identified in the past. Each exome contained about 17,272 coding SNP, on average, they reported.
After cataloging various types of mutations, such as missense, nonsense, splice site mutations and indels, the researchers turned their attention to Freeman-Sheldon syndrome with an eye towards determining how well exome sequencing could find the culprit gene in monogenic diseases.
Indeed, by honing in on genes that were in all four individuals with the syndrome and that weren't found in the dbSNP database, the team could accurately pick out MYH3 as the candidate gene in Freeman-Sheldon syndrome.
While the researchers conceded that the method cannot detect structural changes or mutations in non-protein-coding parts of the genome, they expressed enthusiasm about the possibility of identifying disease-related mutations based on examining sequence from just a few affected, unrelated individuals.
"[O]ur analysis suggests that direct sequencing of exomes of small numbers of unrelated individuals (but more than one) with a shared monogenic disorder can serve as a genome-wide scan for the causative gene," the authors wrote. "The availability of eight HapMap exomes was clearly helpful, suggesting that the power of this approach will improve as the 1000 Genomes Project generates a catalogue of common variation that is more complete and evenly ascertained than dbSNP."