Skip to main content
Premium Trial:

Request an Annual Quote

Q&A: All of Us Data Center Ramps Up With Lofty Goals, Technical Challenges

Josus Denny

CHICAGO (GenomeWeb) – This month, the Precision Medicine Initiative announced that it had begun enrolling beta testers in the All of Us research program.

Formerly known as the PMI Cohort Program, All of Us seeks to collect, store, and disseminate health, genetic, lifestyle, and environmental data on at least 1 million US residents as a way to jump-start and advance research on personalized medicine.

Recruitment of participants is underway in three cities and will eventually include more than 100 sites, ranging from academic medical centers, to health systems, community hospitals, federally qualified health centers, and other locations. People not attached to one of the participating health systems will be able to donate DNA samples at places like Quest Diagnostics laboratories and Walgreens pharmacies, according to Joshua Denny, associate professor of biomedical informatics and medicine at Vanderbilt University.

"We're definitely focused on recruiting a diverse population. That's a core principle," Denny said.

Denny heads the All of Us data and research center; Vanderbilt is the lead organization on the data side of the program, with considerable assistance from Alphabet life sciences subsidiary Verily and the Broad Institute. "We hold all the data and we connect to the front-end apps," Denny said.

Everything is being stored in Google Cloud, supported by numerous other technologies. For example, Vanderbilt is developing computational tools, a cloud workbench, and a set of cohort browsers, kind of a front end for others to enter data, Denny explained.

Others among the pilot set of seven provider organizations and additional FQHCs are building participant-facing apps to collect patient data from DNA samples, patient interviews, and electronic health records, then forward the information to the research and data center. "They send us their EHR data across all these different recruitment sites," Denny said.

Right now, Vanderbilt is just storing samples, as it just began recruiting participants on May 31. "We will later go through and do genetic testing," Denny said. Other sites have their own timetables.

Denny spoke with GenomeWeb about some of the challenges in managing so much information from sites all over the country, including the perennial interoperability issue that has plagued so many other health IT projects.

Below is a transcript of the conversation, which has been edited for clarity.

How did you design the data and research center?

Vanderbilt has put together a large DNA biorepository with about 240,000 people. That's the full force of phenotypes. The only way you can do any study about genetics is by using an EHR in our resource. That became a model for All of Us, to use EHR data as a way to passively collect a really rich set of phenotypes across a diverse range of disease.

Interoperability has been an issue with EHRs for a long time, as you well know. How are you addressing that issue?

Each person who comes in has certain measurements taken and they also will donate blood and urine. For these individuals, we will derive our own genetic testing. We're not relying on the EHR for genomic information. But we are having to process the EHR for research purposes, and there are interoperability issues.

One of the ways we're addressing that is using a common data model. We have adopted the OMOP Common Data Model [from the Observational Medical Outcomes Partnership]. We chose OMOP because it's more inclusive [than newer models like the National Patient-Centered Clinical Research Network (PCORnet)] and we have a lot of connections to it. It has more software around it. From the beginning, it tried to include all labs. It has an NLP extension for how to restore NLP-derived concepts. We are pretty deeply connected with OMOP.

Sites are standardizing their data to OMOP and sending it. We are centrally doing [quality control] on the data and looking for things that look like they are not quite in place and maybe discord between sites.

For direct volunteers, we're also promoting the Sync for Science standard. That's an emerging technology that meets [federal EHR incentive program] "meaningful use" Stage 3 as far as an API to access your information. That data basically leverages FHIR [the Fast Healthcare Interoperability Resources standard]. That's an emerging thing that we're piloting now, so there could be some FHIR-based data, too.

A lot of the standards in FHIR are similar to what you see in OMOP. OMOP, you could say, is a database representation of data that could be transported and shared via FHIR. It's how you organize your data after you receive it.

Will you require all of the participating organizations to use these standards?

Participating organizations are sending data to us now in OMOP. Before we even recruited our first participants, we did some of what we call "data sprints" to work through and send some data. That went well and was a great learning experience for everyone.

We're not starting with the full EHR, but we hope to work our way there. As you highlighted, interoperability is challenging, and getting these data out of an EHR is challenging, especially at the scale that we're talking about across the country. It is something where we're starting with certain steps and we're moving forward to expand into others. We're starting with things like billing codes and visits data and medications and labs, and moving to more labs, structuring those, and notes over time.

What is your timetable for getting everything off the ground, for starting to sequence some of the samples, and disseminating information?  

Data will be accessed primarily centrally through the research tools that we're building. The expectation is that there will be progressive releases as we enroll participants. Probably in 2018 at some point, you'll see some data become available for research. It depends on a number of factors: developing tools, getting data, the speed at which we recruit individuals.

What sorts of information will you make available?

There will be genetic data on the individuals at some point. We haven't figured all of these things out yet in terms of what the protocols will be, but certainly our priority is to have genetic data.

We have blood, urine, and EHR data, and participant surveys coming in, and we will have mobile and digital health devices as well. Remember, we also have those in-person visits, where we take some baseline measurements on height, weight, BMI, waist, hip circumference, [and] blood pressure. Those kinds of protocols may extend over time, too. It's a combination of data they actually give us and data that we can passively collect. We really have the ability to go across all those modalities over time. But a lot of these protocols aren't worked out yet.

There are still some things we are working out, like genetic data and what kinds of sensor data we might collect. Some of this will be done through pilots, and we'll see how it works. We may find that a given device just sounds awesome, but we do a pilot with it and it doesn't work so well, so we don't do it.

Our desire is to get a robust and multidimensional data set out there for really broad usage. We need to have enough participants in that batch to make that resource a useful resource for people.

The anticipation is we won't wait until everyone is enrolled. We will do something on an ongoing basis. It will take four or five years to get a million people enrolled, but we will have useful data before then for sure that will be accessible by others.  

What are some other goals and technical issues you are grappling with?

There is a harmonization issue around all of these types of data. I don't know that there has ever been a disseminated project like this to pull together. There have been networks that have done this on smaller scales that have shown success, one of these being the eMerge network. But there are a lot of things here that we hope are breaking new ground.

One of other things that makes us unique is that participants will have access to their information. They will be able to log in, too, as they get genetic test results. This is not a clinical trial, but people can run clinical trials out of it, for instance.

I hope that propels genetic medicine in general, not just research, but lots more people will know things about their genotypes, and that could drive care. That could change the way community physicians start thinking about this stuff because it becomes more common.

The Scan

New Study Investigates Genomics of Fanconi Anemia Repair Pathway in Cancer

A Rockefeller University team reports in Nature that FA repair deficiency leads to structural variants that can contribute to genomic instability.

Study Reveals Potential Sex-Specific Role for Noncoding RNA in Depression

A long, noncoding RNA called FEDORA appears to be a sex-specific regulator of major depressive disorder, affecting more women, researchers report in Science Advances.

New mRNA Vaccines Offer Hope for Fighting Malaria

A George Washington University-led team has developed mRNA vaccines for malaria that appear to provide protection in mice, as they report in NPJ Vaccines.

Unique Germline Variants Found Among Black Prostate Cancer Patients

Through an exome sequencing study appearing in JCO Precision Oncology, researchers have found unique pathogenic or likely pathogenic variants within a cohort of Black prostate cancer patients.