Skip to main content
Premium Trial:

Request an Annual Quote

UCSD Developing Bioinformatic Tools to Help US Academics Gather, Analyze, Share Biomedical Data


By Uduak Grace Thomas

The University of California, San Diego, will use more than $25 million from two federal grants to develop tools and methods that gather, analyze, and use the large quantities of biomedical data generated by research projects at universities and institutes around the US.

One fund, a five-year, $16.7-million grant, will be used to create the Integrating Data for Analysis, Anonymization, and Sharing, or iDASH, center. Among other things, the center will develop algorithms, open-source tools, and computational infrastructure and services to help scientists more efficiently share and use anonymized research data.

According to the grant abstract, iDASH "will address fundamental challenges to research progress by providing a secure, privacy-preserving environment in which researchers can analyze genomic, transcriptomic, and highly annotated phenotypical data."

As part of the iDASH project, UCSD scientists will partner with colleagues at the San Diego Supercomputer Center; the California Institute for Telecommunications and Information Technology; San Diego State University; Brigham and Women's Hospital in Boston; and Vanderbilt University in Nashville, Tenn.

The funds for iDASH came from the National Heart, Lung and Blood Institute, the National Human Genome Research Institute, the National Library of Medicine, the National Institute of General Medical Sciences, and the common fund from the Office of the Director of the National Institutes of Health.

Meantime, an $8.3 million grant will fund the three-year development of the Scalable National Network for Effectiveness, or SCANNER. This project aims to create computational systems and architecture to exchange health information collected at the point of care.

The grant is part of the American Recovery and Reinvestment Act and funded by the Agency for Healthcare Research and Quality arm of the Department of Health and Human Services.

SCANNER will be developed through teams at San Francisco State University, Charles R. Drew University, the Rand Corporation, and Resilient Networks.

The iDASH center grant was awarded under the NIH's National Centers for Biomedical Computing program. It was the only new center created in the current funding cycle, which also included existing centers at Columbia University, Brigham and Women’s Hospital, and Stanford University.

In addition to meeting compute infrastructure needs, part of iDASH's goal is to provide useful data for researchers, Lucila Ohno-Machado, a professor of medicine at UCSD and the principal investigator on both the iDASH and SCANNER projects, told BioInform. She explained that while there are several existing data repositories, many contain data that needs "a lot of processing" before it can be useful for research projects.

For the first year, Ohno-Machado plans to establish the infrastructure needed to securely host the datasets and software for data analysis.

The computational infrastructure for iDASH will be provided by the San Diego Supercomputer Center, which will leverage its Triton resource. The Triton includes a 2,048-processor cluster for general-purpose computing and an 896-processor cluster designed for "data-intensive" computing.

The iDASH center will also have access to SDSC's "Gordon" supercomputer, currently under construction, which is expected to provide more than 200 teraflops of compute power and four petabytes of disk storage when it is completed next year.

The collaborators will also develop policies to govern data sharing and tools that would allow users to select how and when to share their data. They will also develop data-integration tools that will enable researchers to correlate genomic data with a patient's condition as well as to integrate data collected by different institutions.

After the first year of the grant period, the team will populate the infrastructure with data and incorporate additional tools into the system.

For example, the partners plan to develop a tool to identify uncommon patterns in large quantities of data, which can be useful in healthcare settings to detect complications that are linked to new medications, devices, or procedures early. They also plan to develop a tool that will be able to identify patterns in data streams, Ohno-Machado said.

Other tools that will be developed during the five-year project include compression algorithms and a genomic data-query system.

As part of the development effort, Ohno-Machado's team and their collaborators identified three "driving biological projects" they intend to serve as a "testbed" for the design, implementation, and validation of the tools they create.

The projects were selected with an eye toward covering "different biological levels — molecular, individual, and population" to develop resources that will be useful for a wide range of projects.

The first project aims to investigate how genes and the environment are complicit in manifesting and treating Kawasaki disease, which affects the pediatric cardiovascular system. The team will incorporate genotype, microRNA, and gene-expression data to create molecular phenotypes of patients that they will relate to patient demographic and clinical data.

The second project will monitor the safety of anti-coagulation medications, and a third project will work on a wireless monitoring system to profile sedentary behavior and develop interventions to prevent obesity and cardiovascular disease.

Ohno-Machado said the team will likely put out a request for proposals at the end of the third year and select new projects for the final two years of the grant.

According to its grant abstract, SCANNER "is a distributed network infrastructure for comparative effectiveness research that provides flexibility to participant sites in the means for data sharing … The network will support retrospective analyses; prospective observational studies; clinical trials; and feedback to point-of- care users."

SCANNER complements iDASH because while the latter project aims to create tools for storing and annotating data, the former "allows a more real exchange of those entities and …is much more related to patient data [and] electronic medical records than it is about biological, genomic data, and data streams from devices and these more specialized types of data." Ohno-Machado explained.

During the first year of the SCANNER grant, the collaborators will set up and "certify" the network and then test the system using comparative effectiveness research projects.

For example, they will compare the effectiveness of therapy management by physicians only versus physicians and pharmacists in hypertension and diabetes cases to see "whether receiving instructions and follow-up from pharmacists in addition to primary care providers enables the patient to be more compliant with the medication and whether they have better outcomes because of that," Ohno-Machado said.

Another project will compare the effectiveness of Boehringer Ingelheim's anticoagulant dabigatran, branded as Pradaxa, with the first-generation blood thinner warfarin.