Researchers at Duke University's Institute for Genome Sciences and Policy plan to perform several human whole-genome sequencing studies in the areas of HIV, epilepsy, and schizophrenia, In Sequence has learned.
The first of these studies, which aims to find genetic variants that protect individuals from becoming infected by HIV, is already underway.
Using Illumina's Genome Analyzer, the Duke team plans to sequence and analyze the genomes of at least 50 individuals in-house by the end of the year. In another study, scheduled to start later this year, they plan to sequence the genomes of at least 60 individuals with a different phenotype, most likely HIV-infected patients that quickly progress to AIDS.
But these projects are only two of several human genome sequencing studies the researchers are planning. "This is very much only a beginning. We plan to be done with this relatively soon, and we have plans for even larger whole-genome sequencing studies," David Goldstein, director of the IGSP's Center for Human Genome Variation, who leads the HIV studies, told In Sequence last week. These will also include the areas of epilepsy and schizophrenia.
The first HIV study builds on a genome-wide association study by Goldstein and his colleagues, published in Science in 2007, in which they identified common variants that enable patients infected with HIV-1 to control their viral load.
Since then, "we have come to the view that a lot of important differences are likely to be rare, so we decided we wanted to pursue the rare high-penetrance contributors," Goldstein said.
Probably the best cohort for such a study, he explained, are hemophilia patients who were exposed to HIV through a contaminated blood product but failed to become infected with HIV.
"So we know that the genomes of those individuals must have something in them that protects them," he said. "And subsequent to that, the sequencing technologies advanced fast enough for it to become a feasible research project."
The "basic strategy" is to recruit up to 1,000 patients over the next two years "and to sequence as many of these individuals as we can afford to," Goldstein said. At that point, variants "that look suggestive and interesting" will be genotyped in a larger cohort, he added.
The sequencing data will be generated at the IGSP's genotyping facility, which currently has two Illumina Genome Analyzers installed and will have five additional systems set up this week.
"When everything is up and running, we will be going at a clip of probably close to two genomes a week," Goldstein said.
According to Kevin Shianna, director of the genotyping facility — soon to be re-named the genome-analysis facility — the aim is to generate about 25-fold coverage per genome initially.
"As the technology increases in throughput, we will likely increase that to 30-fold or above 30-fold," he said.
The researchers started to sequence the first genome a couple of months ago with 36-base paired-end reads, he said, "but going forward, now that we have the additional machines, the minimum we would want is to do 72-base [paired-end reads]."
Last summer, the scientists tested the genome-sequencing capabilities of the Genome Analyzer by sequencing flow-sorted chromosome 6 from two cell lines. "We were happy with the paired-end data from that, and that pretty much made the decision that we could do the whole genome," Shianna said.
He declined to reveal the anticipated costs for sequencing the samples because of a confidential agreement with Illumina. The study is supported by a $3 million grant from the Bill and Melinda Gates Foundation, according to an Illumina statement, but Shianna said that sequencing the 50 genomes will cost less than that.
The second HIV study, under which the scientists will sequence 60 genomes, is supported by an undisclosed amount of funding from the National Institute of Allergy and Infectious Diseases.
For analyzing the sequence data from this and other projects, researchers at the center have developed a suite of bioinformatics programs, called Sequence Variant Analyzer, that will allow them to identify all genetic variants likely to have an effect on function, such as indels that disrupt genes or non-conservative amino acid substitutions.
"Each of those will be individually inspected and considered as a possible contributor to resistance to [HIV] infection," according to Goldstein.
He said the researchers are confident that this approach will lead to the detection of new relevant genetic differences. One variant known to date that confers resistance is a deletion in the CCR5 gene, which encodes a co-receptor that HIV uses to enter a lymphocyte, Goldstein explained.
"And such a variation as that one would be immediately recognizable in the framework that I have just described."
The scientists plan to publish a description of the bioinformatics software in the next few months and make the tools available to other researchers at that time.
Goldstein and his colleagues did not consider outsourcing the sequencing to a service provider like Complete Genomics, which promises to sequence a human genome for $5,000 by the middle of this year, because such a service is not yet available at the moment.
"We prefer having it in house now because we know we can get it done, and we know what the costs are, we know what kind of data quality is going to come out," he said. "We don't feel there is any service right now that can offer us that level of confidence immediately.
"As of today, we expect the throughput of the Genome Analyzer to meet our future needs," Shianna said. "However, we are always evaluating new technologies and would make a move if we felt a new technology was more cost-effective and provided better sequence data" in terms of, for example, read length or throughput.