Skip to main content
Premium Trial:

Request an Annual Quote

Data Control and Sharing


From 1990 to 1992, the Havasupai Indians in Arizona participated in a study by researchers at Arizona State University that was to look into why the tribe had such a high rate of diabetes. The Havasupai gave blood samples to the researchers, and later found that those samples had been used to study a range of things — from ancestry to mental illness — that the Havasupai had not approved. The tribe then sued Arizona State, the Arizona Board of Regents, and three professors. Last year, Arizona State agreed to pay the Havasupai $700,000 and to return the samples.
More recently, in May, a plan by the National Geographic Society's Genographic Project to sample buccal cells from the indigenous Q'eros population in Peru came to a halt after some local leaders said that the researchers had not followed proper procedures, and that situation is still being resolved.

An uneasy tension exists between researchers and indigenous populations, particularly in the US, said University of Washington law professor Ron Whitener during a panel discussion at the Biology of Genomes meeting at Cold Spring Harbor Laboratory in May. "There's a lot of baggage when it comes to research, and we're not just talking about genetic research," he said. "The word 'research' is often a dirty word among indigenous groups." Whitener, who is a member of the Squaxin Island Tribe of Washington State, said that this issue dates back to the US settlement of the West and research that was "dubious and racist as a justification to take land." In the case of the -Havasupai, Whitener said, there was a disconnect between what researchers said they were -going to do, and what they actually did. With all of that history, indigenous groups can be loath to relinquish control over their genetic information and how it is used.


On the other hand, there is a movement in the scientific community for wider, and better sharing of resources, including genetic data — and that gives indigenous groups pause.

[ pagebreak ]

Enter dbGaP

One way that the National Institutes of Health is encouraging sharing of scientific resources is through its data-sharing policy for genome-wide association studies it supports or conducts. "One way that the policy goes is the expectation that that data — both genomic and phenotypic — will be shared with the NIH through dbGaP, so that it can be made available to others for appropriate uses in the future," says Laura Lyman Rodriguez, director of the Office of Policy, Communications, and Education at the National Human Genome Research Institute, who was part of the policy development.

As part of an NIH grant application for a genotyping project worth more than $500,000 — which is true for most, if not all, GWAS grants — researchers have to submit a data-sharing plan. "If there is something in the consent form that says, 'This data will not be shared outside of this institution,' then it's not appropriate for that data to come into dbGaP," she says, later adding that "the submitting institution is the authority on those, and so we ask them to tell us how the data can be shared."

Many studies in dbGaP do have data-use limitations. The most common limits specify what disease can be studied and disallow commercial use of the data, Rodriguez says. The limitations have to be discussed upfront, she adds. "We want to assure this is because of legitimate participant protection concerns or legitimate other concerns about the data set, that it is something in the informed consent, or that it is some other concern, and not just that an investigator is trying to avoid sharing the data," she says.

But different institutes and centers at NIH have different program priorities, which affect whether they will fund research if the results won't be shared broadly, she adds. If an institution is focused on a particular disease, it may fund a project researching that even if the data can't be shared. NHGRI, however, is focused on building research resources, Rodriguez says. "NHGRI is about building community resources and building tools that many people can access, and data sharing, of course, with genomics at our heart is very fundamental. We, of course, really want to see studies that are open and very accessible," she adds.

"The GWAS guidelines actually do not say, if you read them carefully, they do not say, 'You must put in genotypes.' It says you are 'strongly encouraged,'" said Pilar Ossorio, an associate professor of law and bioethics from the University of Wisconsin, at Biology of Genomes. "Then it is up to each institute to interpret that guideline, and many have interpreted it basically as a mandate that if you fund GWAS, the data have to go in."

Whitener added that this leads to a dearth of projects studying Native Americans that are funded by NHGRI. "When those data--sharing plans come in from the tribe, they say, 'We'll share if our tribal partners will let us share; fund us or don't fund us.' And in the area of genome-wide association studies, pretty much [it's] don't fund, even if it's highly scored," he said. "Now if you look at other institutes, they say, 'Well that's fine, our policy says there has to be sharing, you'll say you'll share if the tribe approves it, good enough, and we'll fund it.' That's basically been it."

[ pagebreak ]

Concerns about use

To use dbGaP data, researchers must submit a proposed research use statement that the data-access committee then compares against the data-use limitations to determine eligibility. If granted, access is for one year, and must be renewed after that. If a researcher wishes to change the research use, an amendment must be filed and approved by the access committee. "In the data-use certification agreement that every investigator agrees to as well as their institution, one of the terms and conditions of use is that they will only use the data for the specific research use they have been approved for," Rodriguez says.

Those limitations, and the reliance on institutional review boards, don't assuage all concerns. "I've been very worried about the push for greater data sharing through dbGaP," says Hank Greely, a law professor at Stanford University. "The vast majority of people whose data is in it have no idea it is in dbGaP, no idea their data might be in a federal database that is broadly shared." Texas, he adds, recently had to destroy blood spots obtained from newborns that were being used in research because the parents weren't aware of how the samples were being used.

A recent study in the Journal of Empirical Research on Human Research Ethics that Greely cites found that 86 percent of study participants from whom re-consent for submission of their data to dbGaP was sought agreed to have their data deposited. Ninety percent of the respondents to a follow-up survey said it was very or somewhat important that they were asked their permission.

Once in dbGaP, genetic data becomes a government record. As such, Rodriguez says it is subject to government procedures for access. "It's not possible for non-governmental employees to make an access decision about government data," she says. "The way that we have the ability to have community input is into the data-use limitation." The expectation, she says, is that the researcher has a long-standing relationship with the Native American or other indigenous community, that the community would express its concerns about genetic research to the researcher, and the researcher would include those concerns in the data-use limitation application. "It isn't possible on a case-by-case basis for anyone other than people on an NIH data-access committee to vote for access or to disapprove access requests," she says.

This leads to a stalemate between NHGRI and tribal groups, Whitener said at Biology of Genomes. "This refusal of NHGRI, especially for dbGaP, for there to be any ability, once samples go into dbGaP for tribes to have any continuing ability to say what's being used for those research samples — as a result, no research samples are going into dbGaP," he said. "It's really a stalemate with the United States on one side and the tribes exerting their power to essentially say, 'No we're not going to allow any of it to happen within our communities.' And thus it doesn't, which is not a good thing because the disparities that Native Americans have in the United States are in many cases the worst of all races, and research needs to occur and tribes agree that research needs to occur. But tribal leaders have to protect their communities, which is going to make them very conservative about what they decide to allow and what they don't."

[ pagebreak ]


Another concern Whitener raised during the Biology of Genomes panel session was that data can leak out of dbGaP. "There appears to be a lot of data creep out of dbGaP, that there's no enforcement that the data that researchers get out of dbGaP, that the guidelines for how it is shared is actually followed. Actually there is quite a bit of evidence that it's not being followed very well and I think until the enforcement of dbGaP's own rules, I doubt you are going to get much cooperation from any of the tribes," he said.

Greely cites that leak as a concern as well, adding that many of the rules are difficult to enforce. Researchers aren't supposed to share dbGaP data with other labs, but he says he doubts that all data is left behind when a postdoc or other researcher leaves one lab for another.
NHGRI's Rodriguez says that they have penalized researchers for -using data improperly. One researcher, she says, came in to update the data-access committee on the work being done with dbGaP data, and discussed work that was not -approved. "Something else came up and they went in a different direction," she says. "That use actually was still consistent with the data-use limitation, so it did not violate the data-use limitations, but it violated the agreement to only do what you were approved to do." That investigator's access to all dbGaP data was revoked for a period of time.


Both Whitener and Rodriguez say that more discussion and education are needed. Rodriguez says that NHGRI is trying to reach out to Native American communities to understand and work through any concerns that they have regarding genetics and genomics research, not just those involving dbGaP. "Right now, we are particularly looking at ways that we might extend that conversation to explore the issues with data deposition in dbGaP or other databases for broad access. [Native Americans] are important, we fully acknowledge and respect them, and we just want to have that conversation to try to work through what their concerns might be," she says.
"There have been conversations between NHGRI and National Congress of American Indians and various people [and] we're going to continue, I think, havingconversations and hope that we can come up with some sort of system that takes both sides' concerns," Whitener added. "There has to be some other way. I think there is another way, but both sides will have to move a little bit."

Greely says that it's not just an ethics or an indigenous population issue. "Pragmatically, it's a bad idea if it provokes a backlash among people shocked to find their -genome [in a database]," he says. "Like the Havasupai, I don't think they will be participating in genetic research any time soon, and it may affect hundreds of thousands of Americans."

The Scan

LINE-1 Linked to Premature Aging Conditions

Researchers report in Science Translational Medicine that the accumulation of LINE-1 RNA contributes to premature aging conditions and that symptoms can be improved by targeting them.

Team Presents Cattle Genotype-Tissue Expression Atlas

Using RNA sequences representing thousands of cattle samples, researchers looked at relationships between cattle genotype and tissue expression in Nature Genetics.

Researchers Map Recombination in Khoe-San Population

With whole-genome sequences for dozens of individuals from the Nama population, researchers saw in Genome Biology fine-scale recombination patterns that clustered outside of other populations.

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.