Skip to main content
Premium Trial:

Request an Annual Quote

Q&A: Harvard’s Alan Beggs on the Goals of the CLARITY Data Interpretation Challenge


BeggsAlan-5.JPGChildren's Hospital Boston last week released the names of 30 academic institutions and commercial companies that it has selected to participate in its Children's Leadership Award for the Reliable Interpretation and Transmission of Your genomic information, or CLARITY, challenge, launched earlier this year (BI 1/30/2012).

A list of all the contestants is available here.

With a $25,000 prize at stake, participants are required to interpret the DNA sequences from three children with rare genetic neuromuscular and cardiovascular diseases.

According to CHB, all 30 contestants have received raw whole-genome and whole-exome sequence data — generated by Life Technologies and Complete Genomics — along with de-identified clinical data from the three children and their immediate relatives.

They have to submit their findings and clinical reports by Sept. 30. Results of the challenge will be announced in November at the American Society of Human Genetics annual meeting in San Francisco, Calif.

Through CLARITY, CHB hopes to identify methods to address technical and bioinformatics questions associated with analyzing DNA sequence results including standardizing genetic variants and generating comprehensive, actionable reports that can guide decision-making by doctors, genetic counselors, and patients.

In the clinical cases being used for the challenge, all three disorders are believed to be genetically based, but so far the children have tested negative with known genetic tests.

"Traditional genetic tests examine our genes one by one, requiring doctors to have a good idea ahead of time which of our roughly 20,000 genes is the likely cause,” Alan Beggs, a professor of pediatrics at Harvard Medical School said in statement. “The beauty of whole-genome sequencing is that it provides results for virtually all of our genes at once. The challenge for our contestants is to pick out that one disease-causing mutation from the vast numbers of genetic differences that make each of us unique.”

This week, BioInform spoke with Beggs, who is also the director of CHB’s Manton Center for Orphan Disease Research and one of the leaders of the CLARITY challenge, to discuss why the hospital launched CLARITY as well as how its panel of judges plans to evaluate the results of the contest.

What follows is an edited version of that conversation.

Why set up a challenge for these individuals' data? Why not do the analysis and interpretation internally?

The purpose of the challenge wasn’t really for us to outsource analyzing a few families worth of data but really we want to spur the field, hopefully in a collaborative way, to develop better approaches. The technology has obviously moved faster than the bioinformatics and ethical, legal, and social issues, so now it’s possible to generate all these datasets with whole-exome or whole-genome sequences and yet there is still a lot of uncertainty about the best way to process the BAM files or to develop variant call lists. Furthermore, there is tremendous uncertainty about what the variants actually mean in terms of their biological significance. All this leads to a considerable amount of variability in clinical reporting and uncertainty in what goes into the eventual response to the patient.

We’d like to throw this problem out there in a public way for as many different people around the world as possible … academic, commercial, and everybody in between, to give it their best shot on the exact same patients and exact same datasets so we compare all these approaches side by side. We will publish and publicize some information about the very best ones and we hope that the contestants themselves will feel free to publish anything that they develop. We are not taking any [intellectual property] for this — everything that gets done is retained by the contestant — but we’d basically like to have a way to compare results on exactly the same group of datasets and work toward developing some consensus at every step of the way — the informatics, the interpretation, all the way down to the actual paper or electronic report that’s handed to the referring physician and to the patient.

Did you come across the families participating in the challenge during routine clinical care or were they referred to you by other clinics?

We’ve enrolled all three of these families through the Manton Center for Orphan Disease Research. They’ve all got rare diseases and in fact two of these families have been enrolled in my personal research project, which focuses on neuromuscular disease. The third family has a cardiac condition.

They’ve all been enrolled in our research studies and then separately consented and enrolled in the CLARITY challenge, in which they agreed to have their sequences performed and used for this purpose. In each case, we don’t know ahead of time what the result is going to be. They were chosen to represent a mix of things and we hope there’s a somewhat reasonable chance that the contestants can find a smoking gun mutation but also there is a good probability that one or more of these families will have mutations in a gene that hasn’t yet been associated with their particular condition.

And it's pretty certain that in all these cases, there is a genetic cause for the disease and it isn’t something environmental, for instance?

As in any clinical situation there is definitely a chance this might not be genetic but our best guess is that it’s genetic. In the cardiac family, it does follow an autosomal dominant pattern of inheritance. We ended up enrolling six individuals there instead of just a triad of two parents and a child because it turned out we had a brother and a sister who were both affected and who each also had an affected child, so we basically have two triads.

Another one of these conditions is something called nemaline myopathy, which I’ve been studying for many years and we already know about six or seven genes for. In the vast majority of cases, this is clearly genetic although it follows many different patterns of inheritance — some families are dominant, some are recessive, and a lot of them turn out to be sporadic caused by de novo dominant mutations.

When you launched CLARITY in January, you intended to accept only 20 teams, but now you've upped it to 30. Any reason why that happened?

We got a large number of very excellent applicants. I think we ended up with about 40 applications and essentially, we decided that we’d like to take everybody who we thought was qualified and able to conduct the studies.

There were one or two, unfortunately, who backed out at the last minute because their legal departments wouldn’t allow them to participate, in some cases because they worried they were going to be giving up their IP. However our intention is to let everybody do their own thing and retain all their rights. Most didn’t have a problem.

What were some of the things you took into consideration whilst evaluating the applications?

We wanted to see a mix of commercial and academic organizations. We were looking for teams that would be comprehensive, so we were hoping that they would include bioinformaticians and technology specialists but also medical geneticists and maybe ethicists. Not every team had that breadth; some of them were completely bioinformatics based, others were very heavily weighted toward clinical genetics, for example, but every team we felt would have the ability to do a complete analysis. We were also looking for geographic distribution. We didn’t want this just to be companies in the Boston or San Francisco Bay area or our academic cronies at the various hospitals we collaborate with. We wanted to get some of the big players like BGI and also get some of the smaller players who people hadn’t heard of before. This is an opportunity for them to publicize what they are capable of doing and we wanted a level playing field for all participants.

You also said when you initially launched the challenge that it was going to kick off April. I assume things didn’t go as planned since you are just now releasing the names of the contestants?

After we announced the challenge, it was a few months while we solicited the contestants and vetted them and simultaneously, we enrolled the families and were in the process of getting informed consent. We already had some of the DNA samples but for a few others, we had to collect them. Then we sent them out to our two commercial sponsors. Essentially, when we released the data was dependent on when we got the files back and then had a little time to post them. That happened three or four weeks after our initial target date.

Initially, we were hoping to have the responses by the end of August but with summer being a tough time for people to get a lot of things done, we put off the end of the contest until the end of September and are planning to announce at the American Society for Human Genetics conference this November in San Francisco.

How will you judge the results since these are not, as far as we can tell, known mutations?

We have recruited a distinguished panel of judges from both industry and academia and have asked them to take a holistic approach and look at everything. Since these are true unknowns, there is no right answer necessarily. There may be answers that sound right and lead to clinically actionable responses, and I’m sure the judges will favor these. Certainly we’re not going to give high marks to an answer that sounds wrong, but ultimately, in this instance its more about the process than the product. I think that’s recognizing the state of medical genetics and next-gen sequencing today in that we are not really certain in many cases. Further, there may be modifying factors involved here … it may be difficult with just one patient for each condition to be certain what the true major pathogenic mutation is, but if one of the contestants identifies something and reports on it and makes a compelling case that would be of value to a referring physician and their patient, then that will be something that they get credit for.

Some of the applications might be heavily weighted toward new informatics approaches — it may be data collection or data analysis. Others might be very strong in aspects relating to how the information is conveyed to a patient. What we are going to be looking for is the best combination of all these things.

Will there be future CLARITY challenges?

Yes, we are hoping that we are going to be announcing a CLARITY 2. It’s still under discussion but we hope to make an announcement sometime around the time that this one concludes and we announce the winner.

Filed under

The Scan

Boosters Chasing Variants

The New York Times reports that an FDA advisory panel is to weigh updated booster vaccines for COVID-19.

Not Yet

The World Health Organization says monkeypox is not yet a global emergency, the Washington Post reports.

More Proposed for Federal Research

Science reports that US House of Representatives panels are seeking to increase federal research funding.

PLOS Papers on Breast Cancer Metastasis, Left-Sided Cardiac Defects, SARS-CoV-2 Monitoring

In PLOS this week: link between breast cancer metastasis and CLIC4, sequencing analysis of left-sided cardiac defects, and more.