Like many other omics-based research projects, the Personal Genome Project aims to get a better understanding of the genetic bases for disease and a desire to use that knowledge to improve public health.
Also like other studies, the PGP, initiated in 2005 by George Church, a genetics professor at Harvard Medical School, is dependent on volunteers who willingly donate blood, saliva, and other biological specimens as well as demographic and personal health information such as the medications they are taking, allergies, and pre-existing conditions.
However, the PGP is distinct because its participants agree to share their sequence along with health and medical data without restriction, with the understanding that there are no guarantees of anonymity, privacy, or confidentiality; that there is some risk of harm to themselves and their relatives; and that complete removal of any publicly available data may not be possible.
The rationale for taking an open approach to consent, according to a 2008 Nature Reviews Genetics piece written by Church and colleagues, is the reality that in this age of genomic medicine, "the guarantee of absolute privacy and confidentiality [of health data] is not a promise that medical and scientific researchers can deliver any longer." Furthermore, the rapid evolution of genetics and genomics "urges us to abandon the traditional concept of medical confidentiality."
A few years earlier in 2005, a New England Journal of Medicine article written by Isaac Kohane, director of Boston Children's Hospital's informatics program, and Russ Altman, a professor of bioengineering, genetics, and medicine at Stanford University, addressed concerns about the risks of sharing genomic and health data as part of large cohort studies by suggesting that organizers of these studies seek out volunteers who are "information altruists" — people willing to bear the risks of re-identification.
Separate studies such as one published in the Journal of Biomedical Informatics in 2004 by Bradley Malin, director of Vanderbilt University's health information privacy laboratory, and Latanya Sweeney, a Harvard University professor of government and technology, showed that it is possible to re-identify individuals from their de-identified hospital discharge data and demographics.
The common thread running through these three publications is that, even with the help of complex computer algorithms, perfect anonymity as it relates to genomic and health information is more myth than fact.
And though all of these publications are several years old, questions about genomic and health data privacy and security continue to dog both the scientific community and the public. Early this year, researchers from the Whitehead Institute for Biomedical Research, Baylor College of Medicine, and Tel Aviv University published a study showing that it is possible to deduce the identities of participants in NIH-funded public sequencing projects from de-identified genetic material using freely available genetic and demographic information.
The Whitehead-led study, which caught the eyes of several mainstream news publications, prompted the National Human Genome Research Institute and the National Institute of General Medical Sciences to relocate some data from the publicly accessible portion of the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research to more secure accommodations and to call for a re-examination of current methods for managing the identifiability of genomic data.
But if the ability to tie genomic and related health data back to its source isn't novel, why was the general reaction to the findings of the Whitehead study so significant?
Kohane said the reason is two-fold. On the one hand, "we all have those expectations of privacy … whether it's privacy around your health status, privacy about your social life or about your political perspective," he told GenomeWeb Daily News. Sometimes, however, "our expectations of privacy don't actually fit what the actual practice is."
The other reason, he said, has to do with the nature of genomic data. It's inherently different from other kinds of personal information like where people live, their age, or marital status. DNA is "unique to you … it's [your] individual developmental and life maintenance program encoded" and knowing that "authorities other than medical and scientific communities" have access to it can be unsettling, he said.
"It probably is also an overestimate of our current ability to understand more about individuals from the DNA than we actually can," he added. "But nonetheless, I think there is a very basic primeval sense in which [people think] 'You've got a bit of what makes me [me]… that's not only identifying, but it tells you a lot about me and that feels creepy.'"
In an effort to alleviate concerns about the potential for data misuse, Vanderbilt's Malin said that the National Center for Biotechnology Information, for example, restricts access to its database of genotypes and phenotypes requiring research requests to pass through lengthy review processes before being approved or rejected.
In clinical settings, he noted, apprehension about data misuse is much lower because most hospitals have institutional review or ethics boards that set boundaries that govern the data use. Also, genomic data collection in these settings usually has a direct bearing on the donor, often directly correlated with the type of care the person currently is receiving or will receive.
But that is not the case in the research territory where large amounts of samples and data are collected and shared for all sorts of analyses. "The state of the art now is this notion of restricting access [and] providing acceptable use policies," as in the case of dbGAP, he said.
Sequencing and genotyping technologies have made genomic data more accessible than ever before but, as Kohane and Altman noted in their paper, drawing useful insights from that kind of access will require much larger subject pools for study. The general consensus, at least on the research side, seems to be that clearer and detailed legislation coupled with a realistic understanding about the limits of de-identification and the risks of participating in population studies will be more effective than constructing restrictive access systems.
Currently, there's very little by way of genomic data privacy legislation. Hank Greely, a Stanford University law professor and director of the university's center for law and the biosciences, told GWDN that while some states have passed varying legislation governing genetic privacy, "there is no general federal genetic privacy legislation." (The Genetic Information Nondiscrimination Act protects against discrimination and does not address privacy issues in depth.)
On the clinical front, Greely said, genomic data is usually considered personal health information and is protected by the Health Insurance Portability and Accountability Act of 1996.
On the research side the situation is slightly more complicated. In addition to HIPAA, interactions with human research subjects are covered by the provisions of the Common Rule — a federal rule of ethics regarding biomedical and behavioral research.
"If [genomic data] is covered by the Common Rule, there are provisions that try to protect confidentiality," including obtaining informed consent and IRB reviews, Greely said. But, "if it's not covered by the Common Rule, none of those requirements fit."
Subsequently, "a lot of the discussion recently … has been: Is genomic data personally identifiable?" he said. "Because if it's not … and we consider this information truly anonymous, then the regulatory structure has very little limits about what can be done with it."
So far, according to Greely, federal bodies seems to have adopted a rather relaxed approach to the problem, trusting that those who access the data will abide by agreements not to use it for ancillary purposes and that the current anonymization procedures are acceptable. But increasing quantities of both phenotypic and genotypic information up the re-identification potential even in the absence of personal identifiers. Further, public access to data increases the likelihood of data being used for unintended purposes.
"That's an unresolved tension," he said. On the one hand, researchers for understandable reasons "want access to as much data as they could possibly get" but, on the other, participants who signed up for a Parkinson's disease study may not be comfortable with their data being used without their consent to explore genes related to Alzheimer's disease, for instance.
Greely cited one study done by Stephanie Malia Fullerton, an associate professor of bioethics and humanities at the University of Washington School of Medicine, and colleagues that reported the results of a survey of a cohort of individuals participating in a study called Adult Changes in Thought. Those participants were re-contacted and re-consented by researchers involved with the Electronic Medical Records and Genomics, or eMERGE, network to use their data for their unrelated study and to include it in dbGAP.
A survey of 365 ACT participants done as part of that re-consenting process found that about 90 percent thought it was important to obtain their permission to use their data for non-ACT research. Also, around 70 percent said they would have found it unacceptable if the data had been shared without their permission.
It's these issues that make crafting appropriate legislation difficult, according to Greely. "Biomedical research will go better the more data people get. On the other hand, I'm in favor of people having the right to not take part in research they haven't agreed to," he said. "Those two issues are in some irreconcilable tension here."
Meanwhile, current public protections afforded by legislation like the Genetic Information Nondiscrimination Act have a limited scope. GINA protects against genetic discrimination for the purposes of health insurance and employment, but it does not extend to life insurance, long-term care, or disability insurance. Also, "just because something is illegal doesn't mean it won't happen," Greely noted.
Furthermore, he said, GINA doesn't cover so called "dignitarian harms" — instances where data that affects a person's dignity — such as treatment for a drug problem or erectile dysfunction — is publicly available even if it isn't used against them. "The fact that somebody can't discriminate against you based on the information doesn't mean that you don't feel embarrassed that it's out there," he noted.
While the legal issues are still being hashed out, there are some computational techniques that are being developed that Vanderbilt's Malin said will make it harder to link people to their data. One method, he said, encrypts DNA sequences in such a way that when queries are run on the data, the user is unable to determine to which record the result they received corresponds.
He also pointed to open-consent approaches such as the one adopted by the PGP — where subjects accept responsibility for the risks of participation — as one way of addressing the privacy question.
"I think that gets into some interesting social and ethical issues about which individuals would feel comfortable about being involved in such an environment," Malin said. However, "to get people really comfortable with this [is] going to require some type of legal bite associated with it."
Kohane said part of the solution to the genetic data privacy quandary involves defining a common standard for expectations of privacy. He also encouraged researchers to seek out study participants who are willing to share their data regardless of the risks. In addition, he suggested implementing legislation that prohibits the use of public data from large population studies for non-research purposes. Finally, he said study participants should control who has access to their data.
"I think we have to understand that there are so many ways that our private data is leaking out and if we don't set a common standard for what our expectations of privacy are, then we will regret it," he said.
Restricting genomic data, Kohane added, will ultimately delay advancements in medical science. "If the focus is just that privacy must be 100 percent guaranteed, which it cannot be, and relative to other data sources it's probably better protected … then we as a society are giving up on the promise of using large population datasets to advance medical science," he said.