Ohio State University's Center for Clinical and Translational Science has developed a private cloud-based platform that adds to a growing cache of informatics resources for translational research efforts in the life science community.
Endowed with a $1.6 million grant from the National Institutes of Health's National Center for Research Resources, the OSU team began developing the Translational Research Informatics and Data Management Grid, or TRIAD, system a year ago. The project is part of OSU's participation in the Clinical and Translational Science Award program that the NIH launched in 2008.
"The problem we are fundamentally trying to solve [with TRIAD] is that the data that our scientists are looking to interact with live in lots of different places and the only practical way to get the technology out of their way and allow them to do the science that they are pursuing is to give them an easy-to-use ... solution," Philip Payne, chair of the biomedical informatics department at OSU's medical center, told BioInform. "Using a distributed or cloud architecture was really the only path that we saw to do that with."
Since the system is working with patient data, security concerns led OSU to build a private cloud rather than choose a public offering like Amazon Cloud Services, Payne said. However, he added that there wouldn’t be any "technical barriers" to incorporating a public cloud infrastructure "into our mix" at a later date if need be.
OSU used the funds to build a private HP-based cloud; an enterprise storage backend; a set of data transfer, exchange, and analysis tools; and to support a staff of five developers and a system administrator. The system's core functionality is based on the caGRID architecture developed by the National Cancer Institute's Cancer Biomedical Informatics Grid project.
TRIAD has been deployed at OSU and around 20 other CTSA sites and other NIH-funded programs. Additionally, a growing community of researchers is developing extensions and services to run in the TRIAD environment, which also offers resources such as user forums, a community wiki, and knowledge centers.
Following the conclusion of the first round of funding, OSU has received a second $300,000 grant from the NIH to enlarge TRIAD's borders. The researchers intend to use the money to build additional research tools and to explore ways of implementing and maintaining the system locally in other academic research institutions.
Part of that involves making it easier for institutions that want to implement TRIAD for their own researchers but don't have the infrastructure and personnel to handle it, Payne said.
In line with this plan, OSU is in discussions with an unnamed Ohio-based informatics startup that would potentially provide fee-based support for those who may need help deploying and using the TRIAD architecture.
"The vendor relationship is much less about selling the software because we will continue to make all the software components available freely through an open source distribution mechanism," Payne said. He explained that the vendor will serve as "an expert service provider" for institutions "that need support and want to hire contractors to help them with deployment and utilization" rather than hiring staff or relying on their internal resources.
Although the community of TRIAD users is on hand to help, Payne believes that this support is more for the system's "active developer community." Increasingly, he said, he has found many sites that want to use the system but have minimal technical expertise and compute power.
In terms of infrastructure, implementing the system requires at least a single server with eight cores running dedicated virtual machines and a hardware security module, according to OSU. It estimates that the hardware will run a prospective user institute between $4,500 and $6,500.
In addition, the week-long initial installation process requires a system administrator to set up and clone several virtual environments. OSU estimates that it would also require about two Java programmers to produce the applications that use TRIAD.
"We know there are people who have practical problems that they want to solve and that’s largely better served through the vendor community," he said. "We package the software ... [as] a turnkey solution ... but if they don’t have the system support and the technical infrastructure to deploy a service that they need for a particular project, this company would allow them to contract for those services externally rather than having to hire in-house staff."
If the deal goes through, the unnamed vendor expects to have a business plan in place before the end of this year, Payne said.
An Interpreter of Tongues
TRIAD is built on the caGRID framework that underlies NCI's caBIG. At the OSU site, hardware for the system comprises an EMC backend storage infrastructure that provides users with storage on a case-by-case basis; and 28 dual-processor, multi-core blades, each of which provides 16 gigabytes of RAM. About 10 terabytes of disk space are needed to support the infrastructure, Payne said.
Among other resources, it includes a bulk data transfer tool for moving things like images or high-throughput sequencing data, as well as services for exchanging data for patient cohort discovery data between institutions.
On the analysis side, TRIAD offers platforms for applications like computer-aided diagnosis using biomedical imaging, as well as adaptors to run custom R scripts and other analysis tools for next-generation sequence data. It also offers application programming interfaces that let users connect Matlab servers to the system, among other applications.
TRIAD pulls data into the private cloud environment where it can be translated into a "language" that the end user's data analysis tools can understand, its developers explained in a statement.
"When it comes to biomedical research, you have the digital equivalent of the Tower of Babel. One piece is written in French, another is written in Russian and maybe a third component is in Chinese," Payne said. "TRIAD acts like the ultimate interpreter between all the different languages that biomedical data comes in."
[ pagebreak ]
This interpretation is made possible by an "extensible knowledge management infrastructure," Payne told BioInform.
"Various source or recipient information systems, especially in the healthcare and biomedicine space, use a broad variety of data representation and coding standards but we also have significant knowledge from the biomedical informatics community as to how map between those representation standards," he explained.
This "metadata repository ... allows us to define these common data elements so [that] as developers are building ... services within the TRIAD environment, they are able to annotate them with these common data elements" in real time.
As a result, when a user makes a request to either query data or submit data to an analytical service, the system can create links between "those representational and semantic schemas," Payne said, adding that this capability "takes the onus off the individual developer to develop their service to be 100 percent self-contained around the semantics and logical modeling of the data."
The final output is in XML format and the semantics are linked back to common data elements that are encoded in the metadata repository.
Although it varies, Payne estimates that on average TRIAD handles around one to two terabytes of data per day at the most.
TRIAD uses existing research databases such as those created with Vanderbilt University's Research Electronic Data Capture, or REDcap, application; as well as resources in the Informatics for Integrating Biology and the Bedside, or I2B2, infrastructure. As a result, OSU explained, it doesn't replace resources that researchers are already familiar with, but rather expands the information and types of data repositories that they can access.
TRIAD also includes tools to ensure that patient privacy is maintained while accessing and storing tissue samples and medical records.
The system enables researchers to anonymously match tissue samples with de-identified clinical data from medical records using an "honest broker protocol" that doesn't require additional approval for each individual study or require access to patient identifiers such as names, addresses, and medical record numbers.
This need for security factored into OSU's decision to build a private cloud rather than a public infrastructure, though the OSU team hasn't ruled out the use of a public cloud infrastructure at some point.
"I think what we focused on is not necessarily building a monolithic infrastructure but something that allows us to create networks of clouds or networks of networks," Payne explained.
"The idea is for it to be able to grow heterogeneously and elastically," he added. "There are already instances, for example, where we are connecting between our infrastructure and other service-oriented or cloud infrastructures for various academic endeavors throughout the country as part of the CTSA cancer center program so I think that heterogenous mixture is critical to our success."
At present, OSU is using TRIAD to develop data collection, integration, and analysis pipelines that can track phenotypic information for maternal-fetal dyads as part of efforts to understand preterm birth and to develop methods to prevent it.
A separate project is using these pipelines to identify patient and tissue cohorts in various disease contexts from biospecimen repositories in several institutions and an enterprise data warehouse.
Other projects include an effort by the Hairy Cell Leukemia Research Foundation to create a virtual data warehouse of information and resources to explore the pathophysiology of hairy cell leukemia — a rare disease that’s diagnosed in 2,000 patients a year worldwide — and investigate potential treatments in large-scale patient cohorts.
With the renewed funding, TRIAD's developers plan to further develop its metadata repository in response to issues raised by the end-user community.
"We found that this issue of knowledge management, understanding the mapping and semantics that exist between these data resources, is largely one of the most critical needs for the use of this technology," Payne said. "As a result of that, we have focused in the proposal to further enhance that software, to make it more user friendly, so non-technical developers can contribute to that knowledgebase."
A part of the funds will also support the development of small business innovation proposals as well as partnerships with companies in Ohio to further develop the service offering around the software.
OSU is joined on the translational research tool development front by several efforts. IDBS, for example, is leading a UK government-funded consortium to create a cloud-based informatics platform that will support stratified and translational medicine research and collaboration across organizations (BI 06/17/2011).
Additionally, the Finnish government has funded a national project called Biomedinfra that aims to link the country's biobanking, bioinformatics, and translational research resources, as well as connect to a much larger EU effort to integrate resources in these three areas (BI 01/21/2011).
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com