Skip to main content
Premium Trial:

Request an Annual Quote

CLARITY Results Indicate Consensus on Clinical Bioinformatics Methods with Some Room for Improvement


The results of Boston Children's Hospital's Children's Leadership Award for the Reliable Interpretation and appropriate Transmission of Your genomic information, or CLARITY, contest suggest that the community largely agrees on the best informatics approaches for clinical genomic data analysis and interpretation, though there are some areas that could do with improvement.

CHB launched CLARITY in January to address technical and bioinformatics questions associated with analyzing DNA sequence results, including standardizing genetic variants and generating actionable reports that can guide decision-making by doctors, genetic counselors, and patients (BI 1/30/2012 and BI 8/24/2012).

A total of 23 teams took on the task of interpreting sequencing data for three children with rare, undiagnosed genetic conditions. The organizers announced the winners — first place, two finalists, and five honorable mentions — last week at the annual American Society for Human Genetics meeting in San Francisco.

The winning team, led by scientists at Brigham and Women's Hospital, was awarded a $15,000 prize. The finalists — a team from the University of Iowa and another from Genomatix, CeGaT, and the University Hospital of Bonn in Germany — were each awarded $5,000.

In terms of informatics used to generate results, Alan Beggs, CLARITY co-organizer and director of CHB's Manton Center for Orphan Disease Research, noted that most entrants used similar pipelines that incorporated many of the same tools, such as the Broad's Genome Analysis Toolkit. Most participants also used similar quality control measurements to check their findings, such as transition/transversion ratios and testing for minor allele frequencies.

"What that suggests to me is that there really are a limited number of tools out there, but the good news is that there is relative consensus on what are the best approaches," he told BioInform. "Where the real variation came in was … once a list of variants was developed, how did a physician or an expert computer system or something else get used to try and link those variants to a phenotype?"

Some CLARITY participants relied on experts to look at lists of genes and make connections to phenotypes manually, while others employed more automated approaches such as literature mining tools to link phenotypes and genes.

Beggs said that the field needs to move toward more automated methods, such as data mining tools, to whittle down candidate lists before turning to experts to evaluate relationships between genes and phenotypes. "Those are the most powerful and scalable approaches."

As an example, Beggs told BioInform that some contestants listed the TTN gene, which codes for the muscle protein titin, as the likely cause for muscle weakness in one of the children in the study. Only a subset, however successfully "made the connection between the gene and the phenotype."

While not all those who made the connection used automated approaches, those that did “gained an advantage,” he said. Others were fortunate to have the right experts on hand to look at the lists and connect the dots.

However, in a clinical diagnostic setting "we can't just rely on having a human observer having the right knowledgebase" built over a period of years "to link a particular mutation with a phenotype. “

Commenting on the contest in general, Beggs said that he was "pleasantly surprised" with the “number and diversity, and overall excellence of the competitors."

He said that the CLARITY organizers plan to publish a paper detailing best practices from the challenge that will include input from the judges, participants, and organizers. He added that most of the contestants "have been very forward about asking to see other entries and sharing their own information" with other teams.

"It's probably going to take a little while because there are going to be a lot of moving parts but we are starting that right now," he said.

CLARITY's organizers are also planning a second challenge focused on interpreting data from cancer genomes. The team plans to host a Clinical Bioinformatics Summit in Boston next spring during which it will hammer out the details.

The Top Three

The winning Brigham and Women's team was judged to have the best combination of bioinformatic analysis, clarity, and utility of its clinical reports for the three families whose data was used in the challenge. It also demonstrated appropriate identification of the families' likely genetic defects.

The Genomatix-led team was the only group to correctly flag every likely genetic mutation in all three families, while the University of Iowa team came up with a unique approach for returning unexpected genetic results based on patient preferences and indicating low-coverage or low-confidence regions in its reports.

For the challenge, participants were provided with whole-genome and whole-exome sequence data generated by Life Technologies and Complete Genomics from three children and their parents.

Shamil Sunyaev, an associate professor in Brigham and Women's genetics division, said that his team began its analysis by first looking at phenotypic and pedigree data from the families in the challenge to identify known candidate genes as well as variants in these genes that were associated with specific traits.

Next, they used an internally developed pipeline based on the GATK to call variants in sequences provided by Life Tech and then combined these results with variant calls made by Complete Genomics using its internal pipeline to generate a consensus call set, he explained to BioInform. The team then applied multiple quality control steps including looking at things like transition/transversion ratios as well as performing ancestry analysis before moving on to variant annotation, he said.

The group prioritized gene variants that were consistent with the challenge individuals' pedigree based on four criteria: genes that had supporting information from pervious studies either in favor of or against the reported phenotype; genes that had information about molecular function; gene variants with functional significance; and the likelihood that the identified mutations were spurious, Sunyaev said.

This shortened list of variants was then handed over to a clinical genetics team and used to generate a patient report, he said.

Sunyaev said the team plans to publish details of its pipeline and is also discussing the best way to make the software used in the challenge available for broader use.

The Genomatix-led team, meantime, obtained its results by running the CLARITY data in parallel through three mapping and variant calling and annotation pipelines developed at each partner site, according to Jochen Supper, head of computational biology at Genomatix and leader of the German CLARITY team.

After analyzing the data independently, the subteams then analyzed the data with a single pipeline and then "retrospectively looked [to see] if we would have called the same variants with the other pipelines," Supper explained.

There were some "minor" differences in the results generated by each subteam's approach, "but in the majority of the cases all the pipelines would have gotten us to the final result," he said.

Once the team had its list of variants, they entered them into GeneGrid, a new web application developed by Genomatix that is designed to help medical researchers identify pathogenic genomic variations in human sequence data, Supper said.

The software lets users annotate and filter thousands of genomic variations based on a large body of medical and genomics data. It indexes and organizes users' variant information and links it to data stored in Genomatix's internal databases.

A customer could then query the system to, for example, identify variants that are heterozygous in the parents and homozygous in the child and are associated with brain development, Klaus May, the company's chief business officer, explained.

GeneGrid allowed the Genomatix-led CLARITY team to then explore the variants they had identified in the families in the context of background information such as gene and disease associations, Supper said.

Following their participation in CLARITY, CeGaT and Genomatix are planning to offer free exome sequencing and analysis to six new families with conditions whose genetic causes are unknown using the tools and methods that they applied to the challenge data.

They began accepting applications last week and will accept entries until the end of November. The applications that are selected to participate in the project will be notified in December and the results will be available by March 2013.

Hospitals and doctors who are interested in registering patients for this project can get more information here.

Meanwhile, Genomatix is now offering a pre-release version of GeneGrid that enables human gene variant analysis. The company is working on integrating GeneGrid with the rest of its offerings but chose to release the software as it stands currently in response to customer demand, Supper said.

Genomatix's May told BioInform that the company plans to release a fully integrated version of GeneGrid in the first quarter of 2013, one that will work in concert with the company's offerings or as a standalone tool for clients who don't want to use Genomatix's other pipelines.

For its part, the University of Iowa-led team — made up of more than 30 researchers from 10 colleges — split into six subteams each focused on a separate aspect of CLARITY: an advisory team; bioinformatics; clinical interpretation; counseling and genetic interpretation; electronic medical records; and a variant research team.

The group's analysis workflow included Life Technologies' LifeScope software, which was used to map sequence reads and generate BAM files, and an internally developed analysis pipeline based on Galaxy, Eliot Shearer, a member of the team's informatics arm, told BioInform.

Shearer, who is a doctoral fellow in UI's departments of molecular physiology and biophysics and otolaryngology, said that the group also used Picard to identify duplicates in the data and GATK to call variants and realign indels. Finally, they applied internally developed variant annotation software to the data, he said.

Like the team from Germany, Shearer said that the UI group ran the data through several variant callers but opted to stick with the results provided by GATK because "we weren't confident in the other variant calls [since] we hadn't used anything besides GATK in a while and we didn't have enough time to essentially validate the other variant calls that we weren't familiar [with]."

However, "we did see that using multiple variant callers … increased sensitivity," suggesting that "a combination of the multiple variant calls would have increased our sensitivity," he said.

The UI team was also recognized for its consent form — dubbed the best in the competition — which allows patients to customize the type and scope of genetic information they receive after sequencing, particularly so-called secondary findings, which aren't related to the condition under study.

"We decided that the patient was the best person to decide how much … secondary information they should receive, and we developed a consent form that allowed them to do that," Colleen Campbell, assistant director of the Iowa Institute of Human Genetics and leader of the counseling/genetic interpretation subgroup, said in a statement.

The form lays out several categories of information that patients can choose to receive based on the results of their sequence analysis, including whether they may respond negatively to a particular drug, if they carry a known disease mutation that could be inherited by their children, or if they have a mutation that will cause disease later in life, such as Huntington's disease, for which there is no treatment.

Besides the three finalists, five additional CLARITY teams received a "special mention" for their contributions: the Clinical Institute of Medical Genetics; the Research Institute at Nationwide Children's Hospital; the Science for Life Laboratory of the Karolinska Institute; Scripps Genomic Medicine and Scripps Translational Science Institute; and a team comprised of SimulConsult and Geisinger Health System.