Emerging clinical genomics and pharmacogenomics databases may be the critical link between genotype and phenotype, but they also raise serious privacy issues that could stifle future genomics research if left unchecked, according to a bioinformatics team at Stanford University.
Publicly available genomic resources that include SNPs, genotypes, or other forms of sequence information from individual patients may be inadvertently compromising the privacy of those donors, according to the Stanford team — even if the data are “de-identified” or “anonymized” at the time of collection. These approaches to protecting patient confidentiality are inadequate when it comes to genomic information, Stanford’s Russ Altman told BioInform last week.
“We really don’t even know the kinds of predictions that can be made from DNA sequences in the future, and therefore, to just put it out there and hope that nothing bad is going to happen is just a little bit naïve,” Altman said. “You have to assume that science is going to advance ... and so all of a sudden the pieces of DNA that we really couldn’t do much with today — in ten years, they will tell you a whole bunch of stuff about the patient, and it might even be enough to figure out who the patient is.”
Altman, along with colleagues Zhen Lin and Art Owen from Stanford’s genetics and statistics departments, penned a Policy Forum article in the July 9 issue of Science that criticized the lack of patient privacy protection in publicly available genomic resources [BioInform 07-12-04]. They argued that as few as 30-100 SNPs serve as a genomic “fingerprint” that could reveal the identity of an individual within a public sequence database. As genotyping gets cheaper and cheaper, Altman said, this situation could lead to a scenario in which “for a couple hundred or a couple of thousand dollars, someone could determine the SNPs for some individuals … and then they could go into these genetic databases and potentially get a lot more information about these individuals by matching up the SNPs.”
The real risk for human subjects, Altman explained, arises when that public SNP data is linked to phenotypic data that reveals a donor’s identity, along with his or her medical history, risk for disease, or other information that a patient may not want in the hands of a potential employer, insurer, or other party. “Any genotype/phenotype database with a public mission will have this problem,” he said.
Jean McEwen, program director for the Ethical, Legal, and Social Issues program at the National Human Genome Research Institute, said that this issue is a “huge problem” that will become “more acute as people start to construct these large databases.” While NHGRI’s ELSI group has been exploring the issue for some time, she said, “We don’t have any answers yet.”
Funding agencies like NHGRI — as well as regulatory and legislative bodies — could take steps to address these issues, McEwen said, but so far there are no specific initiatives underway to ensure patient confidentiality in public genomic databases. The 1996 Health Insurance Portability and Accountability Act, for example, requires that research data be stripped of identifying information such as names, addresses, and other demographic information, but does not explicitly address genetic data. The Genetic Information Nondiscrimination Act of 2003 — currently inching its way through the US congress — may be a step in the right direction, McEwen said, but added that “the bill has been around for years, and nothing has materialized yet.”
For database providers like Altman’s team, however, the potential risks are too great to wait for guidance from funding agencies or governmental bodies. Stanford has been developing the Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB) for several years. Initially, Altman said, the NIH-funded resource — which contains genotype data along with associated phenotype information — was designed to provide anonymous access to all the data. However, Altman said, “I don’t feel that I have adequate protection for the patients in my database to make it publicly available.” The problem, he said, is “there are no specific social mechanisms for protecting genetic research databases.”
Not only are patients at risk of privacy violations under the current situation, Altman argued, but database providers may also face lawsuits or other consequences in the event of a confidentiality breach. Altman said he’d like to see “federal laws or international agreements that say the patients who donate their DNA for science should never be worried about this, and should something happen, there will be either criminal or civil penalties.” Such guidelines would “indemnify” researchers who are building these resources, he said.
For the time being, the genotype information within PharmGKB is under password protection — only researchers from known educational institutions or commercial research organizations are eligible for access, Altman said. In addition, there is an explicit usage policy posted on the PharmGKB website that prohibits researchers from “attempt[ing] to re-identify the subjects in PharmGKB” once they have access to the complete contents of the database.
Altman said that he was reluctant to restrict access to any of the data in PharmGKB, but the arrangement is a “reasonable tradeoff,” he said. “We can still get the data to the scientists who want to do good research, but we can show that we’ve done due diligence in trying to protect the patients.”
Robert Cook-Deegan, director of the Center for Genome Ethics, Law, and Policy at the Institute for Genome Sciences and Policy at Duke University, said that such restrictions may be the only way to safeguard patient privacy for such projects. “The place to fix this is by regulating use ... by keeping the keys that link data to individuals under close guard (both digitally and with rules for users), and by being careful about what information the data link to,” he explained via e-mail. “It will become a problem if we let it,” he added.
As one of the first publicly funded genotype/phenotype databases, PharmGKB is on the “bleeding edge” of some of these issues, Altman noted, but the researchers behind other similar database efforts have already begun to tackle these challenges in their own way. The University of North Carolina system is collaborating with the National Institute of Environmental Health Sciences to build a mega-database called the Environmental Polymorphism Registry that will eventually include DNA samples from more than 20,000 patients so that UNC researchers can study links between genotypes, environmental exposure, and human disease.
Perry Blackshear, director of clinical research at NIEHS, told BioInform that the registry will provide only “the physical DNA sample” for each patient — not their sequence data. Each sample is stored in coded form, with the links between the sample codes and patient identifiers under the protection of a project steering committee. Researchers who want to use the DNA samples for genotyping or sequencing must apply for approval through an institutional review board, and must re-apply to the IRB for any follow-up studies involving the same patients, Blackshear said. This sequence data does not enter the public domain, he added.
Around 1,000 patients have provided consent for their blood samples to be used for the project so far, Blackshear said, adding that the consent rate of around 75 percent has been “very gratifying.”
Another database, under development at the Mayo Clinic in collaboration with IBM, will eventually contain genomic and clinical data for more than 4 million Mayo Clinic patients. Like the UNC database, the so-called Mayo Clinic Life Sciences System “has IRB oversight … and requires a valid user name and log-on,” said Eric Klavetter, a compliance and privacy officer at Mayo. He said that the data will be available to Mayo researchers only, and “there are a number of other controls in place regarding how the data is mined.”
But developers of databases like the UNC and Mayo systems, which are created for a limited user base, can impose restrictions on the data that large-scale public efforts like PharmGKB have historically shunned. This places funding agencies like NHGRI, which has been instrumental in ensuring open access to genomic data, in a bit of a bind as it contemplates future large-scale data projects.
So far, public data resources like GenBank and dbSNP don’t endanger patient privacy because they contain no phenotypic information. “Sequence or SNP or microarray data per se are not terribly revealing. It’s when it can be linked to a person that it matters,” Cook-Deegan said. The HapMap project is also safe, as long as individual phenotypes are not included in the final data set, NHGRI’s McEwen said. Future projects, however — such as a proposed population genetics study of 500,000 patients — “would raise these issues,” McEwen said. “Will it even make the project doable? That’s an open question.”
From Altman’s perspective, it is critical that these issues be addressed before they actually become a problem. Recalling the 1999 death of a patient that put the “entire gene therapy industry on hold,” he said, “all you need is one case where something bad happens, and everybody will freeze.”