Duke University researchers affiliated with a large-scale longitudinal health study are scaling up the informatics infrastructure for the project, which is now in its second phase.
The Measurement to Understand the Reclassification of Disease of Cabarrus/ Kannapolis, or MURDOCK, study, which is taking a biomarker-based approach to develop targeted therapies for a range of diseases, kicked off in 2007 and recently completed its first phase, called Horizon 1, which focused on biomarker discovery in legacy samples for cardiovascular disease, liver disease, osteoarthritis, and obesity.
The study is now in the midst of Horizon 1.5, which is focused on developing an infrastructure to conduct large-scale studies for biomarker discovery and population stratification that is integrated with electronic health records. The project aims to recruit 50,000 individuals. To date, 8,100 people have consented to participate.
Jessica Tenenbaum, the associate director for bioinformatics at the Duke Translational Medicine Institute and a member of the study team, told BioInform that the study’s informatics arm has so far developed two databases: the MURDOCK Integrated Data Repository, or MIDR, which currently contains biomarker and clinical data from the first phase of the project; and a registry and biorepository that contains samples as well as clinical diagnoses, demographics, family history, socioeconomic information, and more.
She said the team plans to merge the databases to create MIDR 2.0, which will be a single, standards-compliant repository that will hold information from all phases of the project including consent, clinical, and biospecimen data; electronic health records; 'omics and imaging metadata; and study metadata.
The MURDOCK study is being funded by a $35 million gift to Duke University from David H. Murdock, an American businessman for whom the study is named, and has received $48 million in grants from the National Institutes of Health under its Clinical and Translations Science Awards program.
These funds have supported several research projects that use clinical and molecular data as well as statistical techniques and software to explore disease risk, adverse outcomes, and treatment response in a range of diseases.
The money has also supported the development of the informatics infrastructure for data collection, storage, integration, and retrieval. According a paper published in a recent issue of the American Journal of Translational Research, investigators in the project are currently developing version 2.0 of the MIDR, which will include clinical, demographic, and protocol data from Horizon 1 as well as registry data from Horizon 1.5.
Tenenbaum said that integrating these resources involves exploring a variety of standards for storing and exchanging different kinds of data and determining which ones are most appropriate for the study.
For example, in terms of 'omics data, the group is evaluating standards like the Minimum Information about a Microarray Experiment, or MIAME, in the context of research use cases, she said.
“If you are looking to make an integrated data repository that someone’s going to be able to query and say, ‘I’m looking for which experiments have this gene upregulated,’ it’s a very different question than, 'I am looking to borrow a dataset or collaborate with someone who has done gene expression in breast cancer,'” she explained. “The use case you have is going to inform what degree of standards or which standards you use.”
Adopting a standards-based approach will also “facilitate linkage with ongoing epidemiological studies” outside of MURDOCK’s current reach — something the study investigators have begun to consider, the AJTR paper notes.
Tenenbaum said that the team is exploring a number of options for the architecture of the merged database and is considering systems such as the Informatics for Integrating Biology and the Bedside, or i2b2, platform. At the same time, they are also considering some internally developed and commercial software packages for data analysis, although she could not disclose specific details.
Other database development efforts discussed in the AJTR paper include plans to adopt “a metadata-driven approach to augment study data with study descriptions … as well as sample status information over time.”
The metadata aspect is “one of the higher priorities … so that we’ll have a tool that people can go to and browse and see what’s in there,” Tenenbaum said. “It’s not in place yet but in the meantime [collaborators] can call anyone in the team and discuss potential directions.”
The paper also notes that MIDR 2.0 will store some kinds of data centrally while others — for instance very large imaging files — “will remain in their original location, with the metadata pulled into the central data repository.” This metadata would provide information about what “studies, samples, and datasets are available, and whom to contact for more information,” the authors wrote.
The researchers also wrote that future releases of the database will “incorporate row-level, processed 'omic data, as well, to enable molecular data mining and analysis with appropriate governance and approval.”
Tenenbaum told BioInform that the group will charge a currently undetermined “fair use fee” for accessing and using MIDR data — which will differ for academic and industry groups — that will cover upkeep costs for the repository.
Duke’s MURDOCK study is one of several large-scale investigative studies that aim to better understand risk factors for disease and provide improved criteria for selecting therapies and treatments for patients.
One example is the Electronic Medical Records and Genomics (eMERGE) Network, also launched in 2007, which is a national consortium of researchers that is studying relationships between genetic variants and phenotypes. The group has conducted studies in conditions such as Alzheimer’s, cardiac disease, asthma, and diabetes; coupled biorepositories with EMRs, as in the case of Vanderbilt University’s BioVU (BI 4/9/2010); and has released tools to make sense of the data (BI 4/22/2011 and BI 3/23/2012).
EMERGE is funded by the National Human Genome Research Institute and the National Institute of General Medical Sciences, and in 2010 received $25 million in grants to support a second phase of the project (BI 8/19/2011).
Another project with similar goals is the Framingham Study — a long-term study that looks at factors contributing to cardiovascular disease in residents of Framingham, Mass.
According to the AJTR paper, the MURDOCK study provides “complementary information” to these studies. For example, it “adds a population with more ethnic diversity than the Framingham Study” and it isn’t limited to genotypic variation, as in the case of the eMERGE consortium.
Additionally, researchers in the MURDOCK study can “leverage electronic health records in conjunction with subject-reported information,” which “gives us both another source of longitudinal data and the ability to do research on discrepancies between what the patients tell us and what their EHR indicates,” Tenenbaum told BioInform.
However, registries like those offered by the eMERGE consortium provide only “de-identified [EHR] data with no ability to follow up with the subject,” she pointed out.
She said the MURDOCK team is putting in place an appropriate governance structure and documentation for the MIDR data, which will also be well curated and annotated.
“We think [the registry is] a really rich exciting resource and we are very interested in forming collaborations,” she said.
Meanwhile, Duke has partnered with Laboratory Corporation of America to found the Biomarker Factory to provide a “biomarker development and commercialization outlet” for fruits of the study.
This way, “biomarkers we discover in the study aren’t just a feather in the cap of the person who publishes it but … really [have] a chance to be commercialized and get to market and impact patients,” Tenenbaum said.
MURDOCK at a Glance
The MURDOCK study is divided into three so-called horizons. The first horizon began in 2007 and used molecular and clinical data from existing patient samples in resources like Duke’s Catheterization Genetics database, which contains data and samples from over 6,000 patients who have undergone cardiac catheterization.
For this phase, researchers focused on identifying biomarkers and molecular signatures for cardiovascular disease, obesity, liver disease, and osteoarthritis.
These studies explored methods for predicting heart disease risk; factors that predicted patients' response or failure to respond to current treatments in the case of liver disease; factors that influence weight loss and gain; and tools to predict the progression of osteoarthritis and some possible treatments.
Funding for the first phase of the study ended in 2011, although the researchers involved in those projects continue to search for methods of predicting progression and treatment response in the four disease areas, Tenenbaum told BioInform.
In these studies, researchers developed and applied their own analysis methods and statistical techniques, she said. These included things like principal component analysis, logistic and linear regressions, and Bayesian models.
She added that although the group would like to make some of these tools available at a later date, “that has not been the focus so far.”
In Horizon 2, the researchers are performing prospective cohort studies to test the biomarkers and signatures generated from Horizon 1. In this phase, the teams are using samples and data from individuals who have consented to have their information and specimens included in the MIDR registry and biorepository.
So far, Horizon 2 projects include a study exploring Alzheimer’s disease; one focused on biomarkers to identify the onset and progression of multiple sclerosis; genome sequencing of centenarians; as well as studies to identify genetic factors associated with severe acne and treatment, and physical performance. A sub-study explores and characterizes the location of individuals in the registry with an eye toward improving recruitment efforts and study design.
A third horizon is planned although a start date is yet to be determined. Currently, investigators are seeking potential collaborators — from multiple institutions and countries — and possible projects to run, Tenenbaum said.
According to MURDOCK’s website, a proposed study for the third phase intends to conduct prospective, population-based cohort analysis to identify patients who are at high risk of developing diabetes and cardiovascular disease in Kolkata, India.