Hewlett-Packard recently hit a milestone in a collaboration with the Harvard Medical School-Partners Healthcare Center for Genetics and Genomics, with the delivery of a 64-node HP Linux cluster.
But according to VK Holtzendorf, life science strategic program manager for HP’s high-performance computing division, the cluster is “the easy part” in a multi-year, multi-million dollar project that ultimately aims to integrate clinical and genomic data for nearly 2 million patients in the Partners system.
“The strategic thing we’re doing with Harvard-Partners is the merging of the research environment to the clinical environment, but we’re still early on in that process,” Holtzendorf told BioInform.
The collaboration exemplifies HP’s broader outlook for what it calls the health — not healthcare — IT market, said Jeff Miller, vice president of health and education industries at HP.
“We prefer to drop the ‘-care,’” he said. “The reason we call it health is when we look at the industry today, the traditional boundaries — payor, provider, life sciences, clinical product development — they’re all sort of blurring together, and it’s very difficult to find companies who stay in one realm.”
Miller said that HP’s “whole strategy in the health industry is based on the premise that these different groups that once were separate organizations have become much more intertwined, and the information needs of those people in fact are much more intertwined as we go forward.”
The HPCGG cluster is one step in that direction — albeit on a limited scale. The system will provide researchers at Partners’ member institutions — which include Brigham and Women’s Hospital and Massachusetts General Hospital in addition to Harvard Medical School — shared access to distributed computing, storage, and data resources. But Holtzendorf pointed out that the challenges of clinical genomics data management extend well beyond accessibility.
“Now that we have people actually using genomic medicine in a clinical setting, the question is, are their needs different? And are their requirements for connections into an internal system different? And the answer is obviously yes. There are clearly security issues and compliance issues and those sorts of things when you’re dealing with clinical data — individual patient data — versus anonymized research data,” she said.
Isaac Kohane, director of bioinformatics at HPCGG, agreed. “What we’re finding is that the challenges are now not only just in having the horsepower to crunch large data sets, but essentially in organizing the data appropriately, making sure that the annotations are consistent, and essentially making sure that the analyses that we perform are accurate rather than just haphazard calculations.”
Holtzendorf said that HP and HPCGG will continue to tackle these issues as the project, which began in September 2003 [BioInform 10-6-2003], continues. Nearly a year and a half into the collaboration, HP’s services organization has worked with HPCGG to redesign its LIMS framework, and has also installed some PCs and other smaller-scale systems. So far, she admitted, the straightforward infrastructure aspects of the partnership “haven’t changed the world, but we fully intend to do that,” she said.
Three-Step Plan for Health IT
Of course, HP isn’t the only big IT player to target the broad healthcare universe. Early last year, IBM merged its life science and healthcare business units into a single group with a very similar vision. In October, Carol Kovac, general manager of the Healthcare and Life Sciences Solutions group, told BioInform that the combined effort represents a $4.8 billion business for IBM, and ranges from hospitals and payors on the healthcare side to pharmaceutical R&D and academic research groups on the life sciences side. “These are very different enterprises, but we see them all really kind of operating with a common set of guiding forces … so they’re all really part of the same ecosystem, and that’s why we brought our businesses together,” she said at the time [BioInform 10-11-04].
Like HP’s work with HPCGG, IBM also has a high-profile project to showcase its efforts in clinical genomics. The company has been working with the Mayo Clinic since 2002 to integrate its clinical and genetic databases, and has already launched a set of tools and services — dubbed the Clinical Genomics Solution — for integrating, storing, and analyzing genotypic and phenotypic data.
“IBM is a very formidable competitor, and we see them in the market quite frequently,” Miller ceded. However, he noted, “We stand on our own merits, and we think we offer a solid solution base and a solid approach to the market.”
Noting that most industry analysts estimate the health IT market to be growing in the range of 10-14 percent per year, Miller said that competition is to be expected. “Clearly a marketplace with a growth rate like that is one in which HP and a number of companies are very interested in addressing.”
Miller said that HP’s approach to the market has three primary focus areas: integration, insight, and aggregation.
Integration is central to HP’s work at HPCGG, Miller said, in terms of merging genomic sequence data or biomarker information with data in clinical databases. Insight is a natural follow-on to integration, he noted, but requires more analytical tools on the part of HP and its partners. “We’re looking at different ways to annotate information so that we can begin to better understand it, we’re looking at ways to manipulate information and store information so that it’s easier to retrieve, and different ways to actually act on information,” he said. One approach that HP is still in the “early stages” of looking at is visualization techniques, he said, “to improve the ability to understand data — whether it be genetic data or other data that we’re beginning to model in terms of the biosciences environment.”
The third leg of HP’s health strategy is focused on improving the means by which new data is added to the existing pool of information. This is of particular urgency in the realm of clinical data, where physicians prefer to keep paper records. Miller said that HP is developing technology called “digital pen and paper” that will enable clinicians to “automatically capture the information in structured data as they go through the protocols with the patients. … This allows us to quickly get it into an information system where it can be managed, where it can be compared with other data, where we can run analysis of information against it.”
HPCGG’s Kohane said that capturing clinical data remains a bottleneck in advancing genomic medicine. “Whereas genome data has an automated data-access pipeline, clinical data is still this painful, laborious process of engineering clinical systems — and when I say clinical systems, I don’t mean just the information system — I mean the whole system for delivering clinical care.”
The impact of this problem is evident from the amount of data HPCGG has online. “Although the genomic data is right now a smaller volume than the clinical data, it’s accruing much more rapidly because it’s commoditized,” Kohane said.
Kohane estimated that HPCGG currently has around a terabyte of textual clinical data, which increases by a factor of 100 when image data is included. On the genomics side, he said, microarray data and images together total only around a terabyte right now, “but it’s growing much more rapidly than the clinical data.”
In addition, the integration process between the two data types has been a painstaking process. So far, “for small numbers of genes” in “subgroups” of patients, “we have DNA sequencing results in a database that we can join against the clinical database on those patients,” Kohane said. “The data that crosses both sides has really only started to grow in the past couple of years.”
However, he noted, these challenges present “an interesting opportunity” for Partners and HP, “which is given the fact that all this investment has already been made by Partners in establishing processes by which various data types on the clinical care patients can be obtained — because that is relatively rare in the United States — what are the kinds of questions that we can now answer, given that data and the genomic data?”