The newly-launched Serious Adverse Events Consortium has tapped a research group at Columbia University to do data analysis for a large-scale project to identify genetic markers that might identify patients at high risk for adverse reactions to particular drugs.
The amount of the grant award to fund Columbia’s data coordination center is confidential, said Andrea Califano, a professor in the Joint Centers for Systems Biology at Columbia University who is helping oversee the project.
Aris Floratos, executive research director for the Joint Centers for Systems Biology, is overseeing data-management for the project. He told BioInform that his team will receive its first batch of genotyping data from the SAEC next week.
The data is from a study of Stevens-Johnson Syndrome, a rare skin condition linked to adverse drug events. Floratos said that GlaxoSmithKline will be submitting the first data set from the study, from a pool of about 70 patients.
Califano said that there will be an additional 140 match controls, bringing the total number of samples to 210. Floratos said the study could involve roughly 1 million genotypes.
Ideally, the center would have had far more subjects and match controls to facilitate the process, Califano said. However, since SJS is so rare, getting copious data is fairly difficult. The group will eventually study about four to six additional diseases that might include, for example, deep vein thrombosis and QT prolongation, Califano said.
Califano added that in addition to SJS, the first study will include data for another very rare disease called toxic epidermal necrolysis, or TEN. One to six people out of a million get SJS per year, said Califano; whereas a mere 0.4 to 1.2 people per million get TEN annually.
“In clinical trials it is extremely difficult to capture an event that is extremely rare, and those [rare events] can kill your drug,” he said.
Pharmas “are becoming acutely aware of the need to foresee potential downstream serious adverse effects at an early stage, and that can be done by either capturing the subset of the population that had the adverse reaction [or doing a] simple test to see if one can take it safely or not,” said Califano.
Beefing up the Infrastructure
In order to analyze the SAE data, Columbia’s Center for Computational Biology and Bioinformatics plans to ramp up its IT infrastructure by as much as 3,000 additional CPUs, Califano said. Columbia was able to install 2,400 additional CPUs beyond its present 800.
“We will have about 500 terabytes of storage; we currently have about 10 terabytes,” Califano said.
The compute center houses a large rack-mounted Dell cluster, Floratos said, with four to five processors per rack and a total of 500 nodes. Each node contains between four and 16 gigabytes of memory.
Floratos said that most of the analysis tools for the project will be open source, such as MIT’s PLINK software for whole-genome association analysis.
“In clinical trials it is extremely difficult to capture an event that is extremely rare, and those [rare events] can kill your drug.” |
“I expect we will also have the potential to develop some tools ourselves for analysis,” he said, noting that the center has two types of software: one for computational analysis and another for storing the data and analysis results and making them available for the public via a web server.
The center is reviewing vendors for additional hardware and software requirements, but neither Floratos nor Califano would disclose further details.
“We are not just using one tool, and then [saying now we are] done with it,” Floratos added. “It may turn out that some new methodologies may need to be coded, some algorithms that someone has proposed [that] have not been coded yet. If we decide this is something that must be done, we will develop the code for that approach as well.”
Protecting Privacy
Results from the SJS study are expected to be released to the research community for further study next fall, and data from a second study on liver toxicity will be released in 2009, Califano said that results, while public, will remain anonymous. “A genome is as anonymous as your fingerprint.”
Califano said the Columbia team plans to protect the data using electronic and physical security. “Clinical and genetic data will be stored on separate systems. Requests to access the data for future studies, once they have been disseminated, will be handled via a scientific committee and will require IRB approval.”
He said that the data analysis and coordination center provides “a unique model for multiple pharmas to share both phenotypic and genetic data coming from large-scale studies.” He stressed that the not-for-profit consortium plans to ensure that knowledge about associations between SAEs and genetic markers remains in the public domain “so that it cannot be monopolized by a single entity.”
The consortium, which includes Abbott, GlaxoSmithKline, Johnson & Johnson, Pfizer, Roche, Sanofi-Aventis, and Wyeth, is hoping to gain strength from its numbers. Because these adverse events are so rare, “it is almost impossible for an institution to capture a cohort that is large enough for genetic characterization,” Califano said.