Skip to main content

With Analysis of 20 Genomes, Duke Team Demonstrates Proof of Concept for Sequencing Disease Cohorts

Premium

By Monica Heger

Researchers at Duke's Genomic Analysis Facility have sequenced and characterized the genomes of 20 individuals: 10 with hemophilia A that were also exposed to HIV, and 10 without hemophilia and not exposed to HIV.

The study — the largest comparison of whole human genomes published to date — served as a proof of concept for using next-generation sequencing to identify rare, highly penetrant functional variants by showing that the disease-causing gene for hemophilia A, Factor VIII, was "easily" identifiable.

The study also illustrated a number of other interesting and surprising findings, such as the fact that most humans have around 165 genes that are completely knocked out, and also that the number of unique single nucleotide variants seems to level off at around 144,000 per individual after the first 15 genomes are sequenced, said Kevin Shianna, director of Duke's Genomic Analysis Facility within the Center for Human Genome Variation and a co-lead author of the study.

The study, published this month in PLoS Genetics, is part of a larger effort at Duke that aims to sequence around 200 genomes and 500 exomes from various disease cohorts including schizophrenia, epilepsy, amyotrophic lateral sclerosis, and HIV, Shianna said.

In the recent study, the researchers sequenced 10 individuals with hemophilia A who had also been exposed to HIV and 10 controls on the Illumina Genome Analyzer to around 30-fold coverage each. All the sequencing was paired-end, and the read lengths ranged from around 36 base pairs when the team first began the project, to 75 base pairs by the end of the project. They identified around 3.5 million SNVs and 610,000 indels per genome. Nearly 90 percent of the SNVs were found in dbSNP. They also found an average of 338 deletions and 411 duplications per genome, with the average size around 34 kilobases.

They were able to identify mutations in the Factor VIII gene, which is known to cause hemophila, in six of the 10 hemophilia cases. The authors were unable to identify the remaining four cases, however, and attributed this to the fact that hemophilia can be caused by large inversions near Factor VIII, which are difficult to detect with short-read sequencing technology.

The researchers also developed a software pipeline to evaluate the functional potential of each of the variants. Using the software, they were able to identify variants that resulted in the truncation of the Factor VIII protein. On average, each genome had about 165 homozygous variants that were protein truncating or resulted in the loss of a stop codon. They also identified 563 variants across the 20 genomes that were predicted to cause premature stops, loss of a stop codon, or a frameshift change in the coding regions of 484 genes. Additionally, 21 of those variants, located in 20 genes, were found in all 20 genomes.

The genes that were most commonly knocked out were genes in the olfactory receptor family, said Shianna. He said this made sense given the sheer number of olfactory genes and also the large variability observed.

The finding also suggested that future studies of knocked out genes in humans could help identify genes that are not necessary for survival. "We'll soon have 10,000 genomes sequenced and once you start putting that data together the population genetics will be pretty powerful and interesting," Shianna said.

For instance, he said it could eventually be possible to identify every gene that is not necessary for viability, making it easier to identify disease-causing genes. Additionally, there could also be cases where even though a gene is unnecessary for survival, its absence may increase disease susceptibility, he added.

Shianna said the group is now using similar methods to sequence large cohorts in other diseases including schizophrenia, epilepsy, ALS, and HIV. The lab is equipped with 12 Illumina GAs, as well as four HiSeq 2000 instruments, and "for the last two and half years the machines haven't stopped running," Shianna said.

Under grants from the National Institutes of Health, the group will sequence 100 schizophrenia cases, 50 epilepsy cases, and 40 ALS cases. So far, they have completed the sequencing of around 40 schizophrenia cases and 30 epilepsy cases, Shianna said. They have also done whole-exome sequencing of around 150 epilepsy patients.

The team has begun to analyze the results, and he said they have found some interesting variants but it was too premature to discuss results. "Now we're choosing the most interesting variants and will genotype 2,000 to 3,000 controls to see what the allele frequency is in the controls," he said.

Additionally, they are also continuing to use sequencing to study extreme phenotypes in HIV, comparing patients who are HIV positive and progress to full-blown AIDS very quickly, to patients who are HIV positive but have very low virus levels.

Aside from the disease sequencing projects, the group is also collaborating with pharmaceutical companies to study adverse drug reactions and severe allergy, although Shianna declined to provide specifics on the projects.

Shianna said that whole-genome sequencing is becoming an increasingly useful tool for his lab, particularly as costs decline and the technology improves. While the group is continuing to use whole-exome sequencing, he said that it will likely move toward whole-genome sequencing in the future.

"Over the next six to nine months we'll start doing 100 to 200 samples of whole-genome sequencing," he said. "Exome sequencing will still have a place for us, when we have several thousand samples. But, it's become more of a capacity issue, rather than a cost issue," he said.

Shianna added that when the team began the experiment, sequencing a genome to 30-fold coverage cost around $150,000 for reagents, processing, and data quality control. Now, sequencing an entire genome costs around $15,000 on the GA and $7,000 on the HiSeq, he said. And, sequencing an exome on the GA costs around $2,500, which includes the capture step and sample prep.

Our approach is "we'll take any good phenotype we can get, and sequence it," he said.

The Scan

Call to Look Again

More than a dozen researchers penned a letter in Science saying a previous investigation into the origin of SARS-CoV-2 did not give theories equal consideration.

Not Always Trusted

In a new poll, slightly more than half of US adults have a great deal or quite a lot of trust in the Centers for Disease Control and Prevention, the Hill reports.

Identified Decades Later

A genetic genealogy approach has identified "Christy Crystal Creek," the New York Times reports.

Science Papers Report on Splicing Enhancer, Point of Care Test for Sexual Transmitted Disease

In Science this week: a novel RNA structural element that acts as a splicing enhancer, and more.