Bench to bedside. Translational research. These well-worn slogans are used to describe countless programs and initiatives, but real progress toward connecting basic and clinical research activities has been difficult to achieve. Recent initiatives, however, are beginning to show progress toward improving efficiency and integration across a spectrum of historically segregated research domains.
Cancer research begins and ends with patients, especially those who graciously donate their time and tissues based on the promise that they will be facilitating new discoveries. Unfortunately, isolated clinical studies that measure only traditional clinical parameters have not produced the advances needed to significantly relieve the disease burden for many types of cancer. Similarly, molecular studies of cancer conducted without sufficient input and data from clinical investigators and pathologists have not produced the knowledge needed to truly have an impact on the lives of most patients. Recognition of these limitations has led to an increasing awareness that molecular analysis must be conducted in the context of clinical research, and that diverse data sets collected in different locations by different investigators must be aggregated and analyzed in an integrated manner.
The Cancer Genome Atlas (TCGA) program, sponsored by the US National Cancer Institute and National Human Genome Research Institute, represents the impact this awareness is making. Tissue samples will be collected from patients who are part of clinical trials. Tissues must be handled using highly controlled protocols, with strict requirements for pathology verification. Clinical data that can be used for correlation with molecular findings must accompany all samples. The samples are being subjected to molecular analysis at multiple, independent centers, each with established expertise in a particular methodology. All data is being transmitted to a central data coordinating center using standard formats, terminology, and semantics. Much of the data will be openly available to the public, and sensitive data will be provided to any researcher who is approved for access based on criteria established by institutional review boards and the Health Insurance Portability and Accountability Act. The program will also be providing Web-based tools to allow approved researchers to do preliminary analyses of the data without having to download the files.
Here Comes caBIG
Properly supporting the informatics component of a program like this would be a daunting and expensive prospect if not for advances made by the cancer Biomedical Informatics Grid (caBIG) program. caBIG is a cancer informatics federation of more than 80 institutions and 500 people. The program provides a forum for discussion and adoption of standards for interoperability between data and systems. The published caBIG Compatibility Guidelines provide information on how to achieve compliance with caBIG standards. The program provides support and, in some cases, direct mentoring to groups working to implement the standards.
All of the caBIG standards and technologies come together in caGrid, a service-oriented data and analysis grid. caGrid provides for the advertising, discovery, and invocation of grid services that can be physically located anywhere. The architecture is based upon several key standards and technologies. The Globus Toolkit provides the underlying grid infrastructure and Web Services Resource Framework implementation. caGrid adheres to the Model Driven Architecture paradigm and leverages the Unified Modeling Language for information modeling. Data objects are transported in an XML format that is specified by an XML schema, and the Global Model Exchange technology from the Mobius project provides XML Schema management capability. Terminology, semantics, and structured metadata management services come from the caCORE technology suite. The caGrid security architecture supports federated user management, which means that trusted institutions can assert the identity of their own users and provide them with certificates that can be used as credentials on caGrid. Virtual organizations are specified and managed using a grid adaptation of the Grouper technology from the Internet 2 project.
caBIG supports the development and deployment of interoperable research software tools, and provides training materials and sessions to ease adoption. Developer tools and training templates that make it straightforward to design, implement, and support caBIG-compatible systems are provided by the program. Software for the management or analysis of data from gene expression studies, proteomics, pathways, tissue banking, imaging, animal models, clinical genomics, and human clinical trials is currently available. Both academic and commercial firms are contributing to this growing collection of caBIG-compatible software, with both open-source and proprietary products available that meet caBIG technical interoperability requirements. The open-source software is released under non-viral license that allows for derived works and commercialization without any reach-through demands other than attribution. Several of these products are already connected to caGrid, and many others are in the process of becoming caGrid-enabled.
caBIG is transitioning out of a three-pilot period and is becoming a permanent part of the biomedical informatics landscape. The TCGA project is just one example of a growing number of NCI-supported research programs that are being enabled by technologies from caBIG. In addition, other National Institutes of Health institutes and centers are exploring how caBIG technologies can meet the needs of their communities, and are engaged in pilot projects in other disease areas. caBIG is one of several key initiatives contributing to the NIH Roadmap, and international interest in caBIG is also expanding, with informatics groups in several countries beginning to consider and adopt caBIG standards. With the advent of caGrid, all of these programs will be able to more easily integrate their data resources, creating a broader biomedical information grid that extends beyond cancer.
caBIG has demonstrated that the biomedical community can in fact overcome its cultural tendency toward fragmentation and isolation. The benefits to research, and hopefully patients, should become increasingly apparent.
Peter Covitz is chief operating officer at the National Cancer Institute Center for Bioinformatics in Rockville, Md.