About a year in to their efforts, the three Centers for Mendelian Genomics, funded at the end of 2011 by the National Human Genome Research Institute, have collected large pipelines of phenotypes and associated samples to identify the genetic bases of as many Mendelian disorders as they can.
With four years of funding from NHGRI, the three groups — at the University of Washington, at Baylor College of Medicine and John Hopkins University, and at Yale University — are working closely to try to streamline their bioinformatics and analytic pipelines, and to reduce overlap between their sequencing efforts.
The role of the centers is to provide exome or whole-genome sequencing and analysis expertise free to collaborating investigators who have patients with a Mendelian phenotype whose causal gene has yet to be uncovered.
"We've tried to be highly integrated, functioning as a single Centers for Mendelian Genomics," Mike Bamshad, one of the PIs of the UW center told Clinical Sequencing News this week.
"The goal [is] to discover the genetic basis of as many Mendelian disorders as possible … [and] that requires a huge commitment from geneticists around the world to work collaboratively, so we''ve tried to emulate that among the different centers," he said.
At the same time, David Valle, who leads the Hopkins and Baylor center, said that the three groups have tried not to homogenize the program. "Each center definitely has its own personality," he said.
With about a year behind them, the centers are planning their third face-to-face meeting this March to compare, and potentially share, analysis strategies. But each group is also trying not to overlap with the others in terms of which phenotypes it is covering, Bamshad said. The centers are also seeking cases from outside investigators as widely as possible.
To facilitate this, Bamshad said the groups are using a web portal and online repository so researchers can input phenotypic information and link to biological materials for cases they hope the centers will analyze."We have investigators from all over the world applying to the centers," Bamshad said. UW, for example, has involved 80 investigators from 60 institutions and 24 countries in its first year of operation, he said.
The Baylor/Hopkins team has also created a phenotype review system to collect and help categorize potential cases from collaborators in a standard format, said Valle.
"More than two other centers, I think we are willing to consider samples on a family-by-family basis," he said. "So we had to be willing to seek out those families one by one … and we thought it would be important to have a rigorously evaluated phenotype, a set of evaluation criteria repeated over and over for phenotypes submitted, so we built a tool called PhenoDB to do that."
According to Valle, the Hopkins/Baylor center has a manuscript in press describing the tool. "We've also made it freely available [to the other centers] and UW is also using it now," he said.
Bamshad said that UW now has about 70 phenotypes in its analysis pipeline representing a wide range of disease types. About 80 percent of them are known Mendelian disorders, and about 20 percent are suspected to be, but currently have no OMIM number.
"We think there are probably several thousand of those unknown examples in existence, but they are so rare, no one has put together the cases that define the syndrome yet," Bamshad said.
Eighty phenotypes doesn't mean 80 exomes for the group, though. "There are some very delineated autosomal recessive disorders where we think we can make a discovery doing one affected individual or two," Bamshad said. "In others, we might do 10 or 20 because they are expected to be genetically heterogeneous.
"And there are a number of phenotypes [we are targeting] … like cardiomyopathies, thoracic aortic aneurism — where they are really small families and singleton cases, so we are screening larger numbers," he added.
According to Bamshad, the centers give collaborating investigators a minimum of six months to review and publish results before they make public the candidate genes they have identified. UW is now almost at that six-month point with the first phenotypes it has analyzed, Bamshad said.
The Baylor/Hopkins group, meanwhile, has about 3,000 phenotypes in its pipeline, Valle said, and the group is moving through those cases.
In an interview with CSN sister publication In Sequence earlier this week, Shrikant Mane, director of the Yale Center for Genome Analysis said that the Yale Mendelian center took on approximately 2,000 samples in its first year, and depending on the cost, plans to tackle between 2,000 to 3,000 samples a year over the course of the grant using mostly exome, but potentially also whole-genome sequencing.
He said the group is focusing on abnormal brain development, Gaucher disease, hypertension, some cardiovascular disorders, migraine, and kidney diseases. (IS 2/12/2013)
UW and Baylor/Hopkins are both using Illumina machines for their sequencing work. Valle said that the Baylor/Hopkins group is focused on exome sequencing and that Baylor is doing about 80 percent of the team's sequencing while Johns Hopkins is doing the other 20 percent.
He also said that Baylor is using its own exome-capture strategy, while Hopkins is relying on the Agilent SureSelect kit.
At UW, Bamshad said his group is also considering whole-genome sequencing in cases where exome sequencing fails and researchers expect the whole genome might offer an answer.
In the groups' upcoming face-to-face meeting, Valle said the teams plan to compare notes on sample collection and, importantly, on analytics and bioinformatics strategies.
For the Baylor/Hopkins group, Valle said analysis is almost a "family-by-family" effort. "Some parts can be uniform, but each needs hands-on attention at the end," he said. "So it's more demanding for high throughput activities, and we've been working hard building software for that."
Bamshad said the UW team has also been working on developing strategies for de novo calling that it plans to present to the other two groups.
"Analysis is still a work in progress, I think any [of the three teams] will tell you," Valle said. "But, a year from now, I think we will have made great improvements."
A final goal of the centers is to make as much of their discovery data public as soon as possible. One stumbling block in that effort, Valle and Bamshad said, will be that the consent process used is not standard across all of their research participants.
According to both PIs, the teams want to put as much data as they can into dbGaP, but working with a wide range of collaborating investigators and, in some cases legacy samples, the centers may have many datasets that they cannot share.
"We put a lot of effort into our consent form, but if someone sends in a sample collected under a different consent, and our ethics committee reviews it and says it's adequate for the project, we don't re-consent those subjects," Valle said.
Overall, Valle said, the groups are hoping to nail down phenotypes "caused by variants in more than half the genes in our genome. "If you go to OMIM, the number of genes for which phenotypes are known is about 2,900. That's a lot, but it's only about 13 or 14 percent of the total," he said.
"I think eventually if we could get the number of disease genes up, we would be able to understand principles of genetics and disease in a much more robust way than we do now. To use the forest and the trees metaphor, we are looking now at individual trees. I'd like at some point to be able to step back and understand the forest as well," Valle said.
"Are there certain genes that are more exposed by the nature of the biological systems — so more likely to be responsible for certain kinds of diseases? Are there subsets of genes that are less exposed so that they rarely are perturbed in ways that cause disease? In order to answer those long range questions you need a lot of disease genes," Valle said.