Skip to main content
Premium Trial:

Request an Annual Quote

Gene42 Standardizes Phenotypic Data to Improve Variant Classification, Rare Disease Dx


CHICAGO (GenomeWeb) – Gene42, a Toronto-based maker of software to support precision medicine programs, was in the news this month as a technology partner in the Gabriella Miller Pediatric Data Resource Center's newly launched research portal for pediatric cancers and structural birth defects.

The Data Resource Center partners, led by the Center for Data Driven Discovery in Biomedicine at the Children's Hospital of Philadelphia, have asked Gene42 to customize its PhenoTips platform to integrate "deep phenotyping" with genomic data. PhenoTips helps clinicians capture, standardize, and analyze phenotypic information at the point of care for patients with genetic disorders.

According to Gene42 Chief Medical Affairs Officer Pawel Buczkowicz, "deep phenotyping" means bringing as much specificity as possible to phenotypic descriptions in pursuit of better genotype-phenotype matching.

To understand the concept, Buczkowicz said that it is helpful to contrast PhenoTips data capture with the old method of charting patient encounters on paper.

"There are thousands or even tens of thousands of terms that could describe any particular symptom or physical feature of any given patient," he noted.

"If you had a pen and paper, there is no way that you would sit in front of a patient with a checklist of every single possible term. Usually, those paper checklists would have 10, 20, 30, 40 of the most common descriptions about a patient that you might see." Clinicians likely would choose the most general terms.

But what if a patient had just had a seizure?

"There are so many different types of seizures, so just checking 'seizures' is not a very good way of describing a symptom of your patient because the road to diagnosis could be split into 10 different ways or more," Buczkowicz said. If a user types "seizures" into PhenoTips, the system lists the general term, but also gives the option of digging into the Human Phenotype Ontology to find a more specific term to describe a particular patient, he explained.

"This is where the specificity comes in, to be able to do a lot of those genotype-phenotype matches and gene prioritization and diagnosis suggestions. That is the rich data that is required to get those things right, rather than having these generalized terms," Buczkowicz said.

Standardizing terminology helps PhenoTips serve as a connector for genotype-phenotype association. A year ago, Gene42, which grew out of a collaboration between the computer science department at University of Toronto and the genetics department at the Hospital for Sick Children in Toronto, started building a genomics component for PhenoTips. This has led to standardization of how PhenoTips stores and processes whole-genome and whole-exome sequencing variant data, Buczkowicz noted.

Gene42 was founded in 2014, about two years after PhenoTips development began. PhenoTips, first released as an open-source application by University of Toronto developers in early 2013, was built based on the observations of how several hospitals were recording data.

"Different clinicians could describe what they're seeing about a patient's symptoms or physical features, physical findings, in slightly different ways," Buczkowicz said. This could include not only a standard medical term or a common term or variations of a standard medical term, but also abbreviations, acronyms, and spelling mistakes, he noted. This might be acceptable if a human were reading through each note to understand the nuances of what was being recorded, but it was not efficient, nor was it suitable for electronic processing.

"For a particular term, you could be looking at 40 different ways in which a group of people might have written it or describe that about a patient over thousands of medical records over a year," he explained.

"If you actually want to fully utilize the capabilities of computers and machine learning, you have to standardize that data in a way that is easily readable to computers," Buczkowicz said. "That opens up a lot of different possibilities in suggesting different symptoms during a patient visit, suggesting genotype-phenotype correlations, suggesting diagnoses, and a lot of other things. The sky's the limit at that point."

PhenoTips allows clinicians to enter data in nonstandard formats — even with misspellings — then automatically maps terms to a standardized ontology for easier computation.

"Being able to connect the system like PhenoTips, which captures the standardized deep phenotype data at the point of care, and then being able to share that with services that do the interpretation is actually a very powerful way of streamlining that workflow and actually providing great clinical benefit," Buczkowicz said.

While the free, open-source version still exists — and has hundreds of institutional users on every continent except Antarctica — Gene42 came about as a company as clinician users started requiring commercial-grade support and custom development of PhenoTips.

"It required more backing than just having either their internal IT department maintain it or … installing it on a computer or server that they might have had in their lab," Buczkowicz said.

The company has raised its public profile this year.

In January, Gene42 landed a Genome Canada contract to build a nationwide cloud-based data repository under the agency's Large-Scale Applied Research Project (LSARP) grant program. The repository and analysis platform, dubbed Genomics4RD, is a piece of Care4Rare Canada's C4R-Solve, which seeks to apply multi-omics research to diagnose rare genetic diseases.

The Gene42 chief medical affairs officer said that the way PhenoTips manages data is ideal for such a repository.

"The back end is very much a database-driven model, which allows that to be connected to a lot of different systems. The standardization of the way that the data is structured allows it very well to be actually structured into a repository," Buczkowicz said.

He noted that genotype-phenotype matching and subsequent diagnosis has proved difficult with many rare diseases. "Even you live in a country with millions of individuals and you might have access to health data of other individuals with different genetic conditions, the disease may be so rare that nobody would be able to find a similar patient in one particular system," Buczkowicz said.

PhenoTips is capable of sending anonymized data to PhenomeCentral, which is the Canadian arm of the international Matchmaker Exchange. "The idea of recording data in standardized formats and being able to use that data is a pretty simple approach to begin with," according to Buczkowicz, but one that other software developers sometimes overlook.

"A lot of people start off with machine learning, trying to find signal among a bunch of noise, but I think it's important to learn to walk before you run," Buczkowicz said. "Storing the data in standardized ways is actually what enables you to do matchmaking across countries in trying to find patients with rare genetic conditions that might lead to new diagnoses."

LSARP faces a similar challenge, in that each province and territory within Canada has its own health ministry, making coast-to-coast interoperability of electronic health records and other health data elusive. "We are hoping that this project will shed light on new ways in which data sharing nationally can help reduce data costs and increase diagnostic rate and decrease time to diagnosis for patients with genetic disorders," Buczkowicz said.

A month after winning that Genome Canada contract, Gene42 announced a partnership with SeqOne, a French developer of a genome analysis platform, to augment gene sequences with phenotypic data from clinical encounters. This collaboration is meant to address the estimated 80 percent of rare diseases with genetic causes, the companies said at the time.

While both firms make analysis and machine learning technologies, SeqOne focuses more on prioritization of genetic variants, following American College of Medical Genetics and Genomics  guidelines. PhenoTips handles the capture and standardization of phenotypic data to improve prioritization, Buczkowicz explained.

"A lot of times, probably the majority of the time, either the labs or the companies that do … prioritization don't get data in standard formats," he said. "You can imagine that there might be incomplete data in certain cases, and that hampers the interpretation efforts."

Seven months after announcing the SeqOne deal, Gene42 has identified two sites in France for piloting the integration, but has not publicly announced those sites. Expect the testing to start in early 2019.

The new Gabriella Miller Pediatric Data Resource Center contract has brought some of the early GenoTips work full-circle. Gabriella Miller herself had diffuse intrinsic pontine glioma, and prior to joining Gene42, Buczkowicz researched that rare, deadly form of pediatric brain cancer at SickKids.

"The initial link to DIPG as part of this project is personally satisfying to me," Buczkowicz said.

"For everyone at Gene42, this is something that we really love to do. We want to help clinicians diagnose patients faster. We want to help patients interact with their data more easily. We want researchers to be able to have access to data more easily and move the needle forward on all of these efforts in rare diseases and cancer," he added.