The Ontario Institute for Cancer Research has kicked off a project that promises to address some key informatics challenges related to clinical sequencing.
Lincoln Stein, platform leader of informatics and biocomputing at the OICR, described the project in a presentation at the fourth Summit on Translational Bioinformatics in San Francisco this week.
Stein said that the project, conducted in partnership with Canada's Princess Margaret Hospital and dubbed the Genomics Pathway Sequencing project, or GPS, will sequence genes in normal and tumor samples excised from patients.
The goal is to report on “clinically actionable" mutations that, when combined with information in a patient’s clinical health record, will help clinicians suggest clinical trials for patients that would yield the most benefit based on their molecular profiles.
“The biggest challenge,” Stein said, “is trying to keep the amount of information in this report to a minimum without keeping potentially important information back [from physicians].”
Stein noted that part of the project is to “tune the report” to the physicians' comfort level, and at the same time educate them on how to work with this type of data.
In the first year of the three-year pilot phase of the study, the partners plan to recruit about 50 patients who have metastatic breast, colorectal, lung, or ovarian cancer that is refractory to standard therapies. The team hopes to “ramp up” to about 300 patients a year, Stein said.
For each individual, OICR researchers will sequence about 1,000 cancer-related genes in control and tumor samples and attempt to identify mutations that are of immediate relevance to the patient’s cancer care, or that are targeted by drugs that are currently being tested in trials.
The researchers will use several mutation-consequence callers to identify genes in the samples and estimate the severity of each variant. Once these mutations are identified and confirmed, the researchers will generate a draft report that will be sent to an expert panel to evaluate before a final report is sent to a cancer patient's oncologist.
Furthermore, any “incidental” variants that have no bearing on the patient’s cancer — such as novel mutations or those related to diseases other than cancer — will be sent to a review panel for confirmation before the data is released to patients and their oncologists.
The genes in the study — which include the usual suspects such as KRAS, p53, and B-Raf — were selected based on suggestions from oncologists in the Toronto area as well as data stored in knowledgebases at the Memorial Sloan Kettering Cancer Center, the Wellcome Trust Sanger Institute, and the National Cancer Institute.
To sequence patient samples, OICR will use Pacific Biosystems' single-molecule sequencing as a first step to discover novel mutations. It will then sequence known mutations on Sequenom's MassArray platform. The sequence variants will be confirmed using a Sanger sequencer housed at the clinical sequencing lab at Princess Margaret hospital.
Stein explained that the OICR selected the PacBio system because “it allows us to do very long reads at high coverage … currently 1,000-base-pair reads in a circular consensus, which gives high accuracy for targeted genes.”
In addition, he said, the platform has a rapid turnaround time, taking only “15 minutes per run to completely analyze … a single typical gene.”
The project will also assess the clinical and sociological outcomes in both the patients and the doctors. For example, they will look at how patients may respond to news that they aren’t candidates for specific treatments because they don’t have a target mutation, or how patients and clinicians might respond to incidental findings.
In his talk, Stein said that the partnership was forged to examine the feasibility of bringing sequencing into the clinical laboratory; to show that it is possible to do genome sequencing and report the results in three weeks; to learn whether incorporating genomic information will improve health outcomes; and to determine how patients and their oncologists respond to genomic information.
For its part in the partnership, Stein's team has created an open source database where it will store all the genomic data and manage sample tracking as well as mutation and consequence-calling information, he told BioInform.
In addition, the bioinformatics team will develop a clinical database based on metadata that will contain each patient's clinical, radiological, and pathological information.
Patient samples will be stored in a biobank that is jointly maintained by the OICR and area hospitals.
To identify mutations, Stein’s team won't develop a new set of tools but will rather rely on standard bioinformatics tools such as the Sorting Intolerant from Tolerant, or SIFT, algorithm to analyze the effect of variants.
The investigators aim to keep the final report to a single page, which will provide information on specific gene variants and whether they are activating or inactivating mutations.
In addition, the team will provide oncologists with links to more detailed information about the genes included in the report so they can further study specific mutations of interest.
“This is a good example of translation in the making,” Stein said. “People have been talking about this type of project for a long time. … I know a lot of other groups are embarking on similar pathways.”
For example, the Coriell Institute for Medical Research in Camden,NJ and the Ohio State University Medical Center launched a project last month that aims to understand how physicians can use patients' genetic risk information, included in electronic medical records, to personalize clinical care in those diagnosed with congestive heart failure or hypertension.
The project is expected to reveal whether genome-informed medicine is useful in practice and how likely doctors are to use such information (BI 02/11/2011).
Likewise, Mark Boguski is leading an effort at Beth Israel Deaconess Medical Center at the Harvard Medical School, to create a computing environment that can handle patient data that complies with HIPAA regulations, incorporates electronic medical records, and can attach “medically actionable annotation” to genetic data (BI 10/22/2010).
On the commercial front, vendors such as IDBS and TransMed, which both had representatives at the conference, appear to be paying attention to the trend and are preparing their systems to help integrate genomic and clinical data.
In fact, IDBS this week announced that it had developed a translational medicine informatics platform called ORIS, as part of a project with the UK's Kings Health Partners that aims to integrate cancer genomic and clinical data for personalized therapies (see related story, this issue).
At the conference, TransMed officials demonstrated Cohort Explorer, which enables researchers to identify and group patient cohorts based on similar characteristics; and BioClinical Analyzer, which helps users analyze data to discover new markers.
Similarly, last year several large healthcare IT vendors, including GE Healthcare and Cerner, added new capabilities to their systems in anticipation of the trend of integrating genomic and clinical information (BI 04/02/2010).
Room for Improvement
Other talks at the conference indicated that although the field has come a long way there is still room for improvement. For instance, there is a need for more comprehensive gene ontologies; methods of structuring, mining, and interpreting both genomic and clinical data; approaches to identify and track clinically relevant variants and their impacts; and methods of sharing data and research resources.
As an example, one presentation from a group at the Genomics Institute of the Novartis Research Foundation looked at the potential of mining structured gene annotations from Gene Wiki — a portal that aims to provide a continuously updated review article for every notable human gene — to provide more detailed annotations.
Good observed that in the gene-ontology arena "there is still a lot of work to do" as a large number of genes are either "weakly annotated or not annotated at all." He proposed a method of gleaning information from the text accumulating in Gene Wiki that could perhaps lead to more detailed annotations.
The team used information from Gene Wiki to provide annotations of normal gene function and annotations of the relationships between genes and disease.
During the presentation, Good explained that his team first identified concepts — related to disease or normal gene function — in the text of the article that described the genes, assumed that the concepts were about the gene, and then compared the candidate annotations to known annotations.
For the comparison, the team used the National Center for Biomedical Ontology's Annotator Web service, limiting the search to the Gene Ontology and Human Disease Ontology, and only using exact term matches.
Out of 10,000 articles, they identified about 11,000 matched Gene Ontology terms and about 3,000 matched Disease Ontology terms. When the candidate annotations were compared to the GO database, the researchers found that 30 percent were exact matches to known annotations, while the rest were matches to more general terms or possible novel annotations.
For the Disease Ontology, the team found that 10 percent were exact matches to known matches, 72 percent were possibly novel annotations, while the rest matched general terms.
As a next step, the team selected a random sample of the candidate annotations and sent them to expert annotators from the Gene Ontology Group and the Disease Ontology group to validate their findings.
Good said that the evaluation process is currently ongoing and so far out of 80 Disease Ontology candidates, 86 percent were considered to be of good quality. In contrast, 15 percent of Gene Ontology annotations out of 117 candidates were considered good quality.
Another presentation highlighted one of the more recently spawned community challenges: the Critical Assessment of Genome Interpretation, described as a community experiment to evaluate the effectiveness of computational methods used to make predictions about how genomic variants affect phenotypes (BI 11/12/2010).
The Pulse of Bioinformatics in Healthcare
Stein noted that a major trend in the translational bioinformatics space is that researchers are no longer working on isolated sub-disciplines such as expression-array analysis, literature mining, or ontologies. Rather, they are starting to develop ways to “integrate multiple data modalities.”
Indra Neil Sarkar, director of biomedical informatics at the University of Vermont’s College of Medicine and the chair of this year’s TBI summit, echoed Stein’s comments in a conversation with BioInform, noting that the community is “getting one step closer to identifying ways to take very, very complex, multifactorial, multi dimensional … features from a biological perspective, put them together, and really start to talk about how we can turn that into something that is clinically actionable.”
With the recent move to incorporate genomic information into patients' electronic health records, “we are starting to see this reality between the gene landscape and the clinical landscape and say, 'How do you take this genetic information, as complex as it is, and turn it into something that a clinician can actually use as a decision point?'”
Alongside the ongoing evolution in the field, Sarkar noted that while, traditionally, the translational bioinformatics field has focused on methods to manage data, it is starting to shift towards “hypothesis generation,” which he expects will be the focus in subsequent years.
“In order to get to the part of the story where we could begin to ask questions, we had to do the data management,” he said. “We are past that and now we are back to the root [of the] biological and clinical questions: 'What’s the interesting hypothesis that you can ask now?'”
Still, novel tools to handle the data are needed, Sarkar said, especially because “we are generating more data in a day than it took 20 years to generate.” Since this is unlikely to change, it poses a non-trivial informatics challenge. As such, there is still ample opportunity for bioinformaticians to ply their trade.
A key problem is that “the data-pipes are just not big enough,” he said. “We are going to need a totally new generation of data-sifting algorithms — computational techniques that are parallel beyond belief, [and] we need to figure out a way of globally coordinating and using centrally available resources without replicating them over and over again.”
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.