A thorough discussion about personal genomics — what it means for the average consumer, the health care system, and the research community — often raises more questions than it answers. While the public discourse on genetic privacy can be traced back to the days of the Human Genome Project, only recently has a new era been ushered in — thanks to the steady decrease in the cost of DNA sequencing — with promises of a tailor-made approach to medical treatment and new discoveries from rich genetic data sets. Depending on whom you ask, personal genetic information should either be protected at all costs as personal property or is merely information fit to be published online for the whole world to see and contains nothing more revealing about health than, say, the knowledge that someone smokes.
That there is such concern over whether genetic information is more vulnerable to attack or misuse than traditional personal health care records may be an unintended consequence of the hype that touted personal genomics as a means to discover all there is to know about a person. "We've done it to ourselves. The way we were selling this idea of genomic data providing much better insight into who you are, and your future, than all other types of data — it was very effective and we really meant it," says Misha Angrist, an assistant professor at Duke University's Institute for Genome Sciences and Policy. "However, until we find all that dark inherited matter that we haven't identified to date, most of our genomic signals are not nearly as disclosing as knowing your health risks because you're a smoker. But you can imagine that if one wanted to be a bit worried about the health care system, having a whole community saying that we're going to [be] developing the preeminent disclosing data source, people who were likely to get scared were going to get scared."
During the first few months of 2011 alone, advocacy groups like the Forum for Genetic Equity, concerned citizens, and a handful of state representatives began efforts to protect personal genetic information. In January, a group of Massachusetts state senators introduced the Massachusetts Genetic Bill of Rights in an attempt to make up for perceived shortcomings of the federal US Genetic Information Nondiscrimination Act of 2008. "This is a new era in medicine and we need to make sure that [there are] some sort of safeguards. We thought it would be important to deal with this quickly, rather than at some future time after perhaps people had already lost some of their rights," says Massachusetts Senator Harriette Chandler, who is a lead sponsor of the bill. "There obviously are issues; the insurance industry is not going to like this because they feel that they want to know any genetic information that the individual has, but what we're trying to say is that basically genetic information is your property, like any other property you have, and you have a right to privacy with respect to genetic material."
In March, representatives in Vermont and California introduced similar bills in an attempt to declare genetic information the exclusive property of the individual, among other protections. These bills face many hearings and debate sessions before having a chance of being enacted into law, but the movement is expected to spread across the country.
In 2008, a PLoS Genetics paper from Stan Nelson's lab at the University of California, Los Angeles, set off alarm bells in the personal genomics community, calling into question researchers' abilities to keep genetic data truly private. The article reported that individuals could be identified from anonymized GWAS data sets, a finding that led to changes in how researchers think about genomic data and informed consent as well as new practices for databases housing genomic data.
While the paper did not set out to assess privacy or make claims about security within genome-wide association studies, co-author Nils Homer, now a senior staff scientist at Life Technologies, says the team discovered that some of the concepts they were exploring could be used to identify trace contributions in DNA mixtures. This led them to conclude that an individual participant in a particular study could be identified, raising serious concerns about researchers' abilities to maintain privacy and anonymity — something participants are promised in exchange for consenting to have their genetic data used in research. "I think it made people think about informed consent in detail because, regardless of the study design or effort in anonymization, you are looking at genetic signals, so you're going to have some types of personalized contribution," Homer says. "No matter what you do to try and randomize things there's always the probability that there is a signal of an individual in there. It also made people think about what it means to participate in a study and how to make the public understand what they're consenting to when they do these types of tests or participate in these studies."
The findings also gave the National Institutes of Health pause. Up until the Nelson et al. paper came out, aggregate genomic data had been openly available — that was standard practice, not just for NIH's databases, but for many others. "That 2008 paper changed how we had been thinking about the privacy of genomic data in aggregate or pooled samples because, prior to that time, we thought it was impossible to learn anything about an individual in pooled DNA," says Laura Lyman Rodriguez, director of the Office of Policy, Communications, and Education at the National Human Genome Research Institute. "If you could get that much information, there was other information available at NIH about what the characteristics of that pooled sample were. To make every effort to protect the privacy of the individuals, the NIH moved all of our aggregate genomic DNA information behind controlled access, where we have very specific agreements and expectations with all of the users with controlled access data at their institutions."
Cause for concern?
At a 2010 American Association for the Advancement of Science panel, experts weighed in on how well current government policies — like those set forth by the US Department of Health and Human Services — protect volunteer privacy and anonymity for genomics research. Various camps voiced positions that ran the gamut from regarding genetic information as something that should be treated with the utmost care and sanctity to saying DNA is the ultimate identifier, even when made anonymous, thereby rendering the privacy debate moot. "Some people have the understanding that there's really no problem because we don't even have ways of discrimination against each other based on it, and I don't even mean the legal GINA stuff," says Sharon Terry, president of the nonprofit health advocacy organization Genetic Alliance, who helped chair the panel. "For example, I can't look up your genetic information and know you're going to be stupid about X, Y, or Z. I can't say that, because the genome doesn't tell us that kind of thing."
According to Terry, there still very much exists a wide range of understanding about what genomic information means, as compared to other types of health or personal information. There are also questions about whether or not the Health Insurance Portability and Accountability Act, known as HIPAA, is enough to protect personal genetic data or if additional laws need to be put in place. "We are in an age where we do need regulation, and there's no consensus about whether or not HIPAA is enough or whether we need something else," she says. "The biggest thing that most people say is that there needs to be some way of punishing people who intentionally misuse or try to create data sets whereby they identify people, and right now there isn't anything like that."
In addition, there is a difference between the ethics of privacy and the laws that regulate it. Policing the research community and direct-to-consumer genetic testing companies to ensure that regulations are enforced and appropriate censures are issued when things go wrong is important from both legal and ethical perspectives. But perhaps the more fundamental threat to the future of personalized medicine is that the public's trust is at stake. "In the US, if you have a federal mandate to remove the identifiers when using data for research purposes and you end up violating that requirement, you might be subject to a federal investigation or maybe receive fines," says Bradley Malin, an assistant professor at Vanderbilt University School of Medicine. "But at the same time, you also violate the trust your patient population had in you, and that, more than anything, is probably the biggest problem. Because even if it's not a major monetary harm, people may not want to participate in research studies or [may] pull their data out of existing research studies."
Consumers should be extremely worried about the privacy and security of their data, says Charis Eng, chair of the Genomic Medicine Institute at the Cleveland Clinic. Eng and a team of lawyers and ethicists combed through some DTC genetic testing company consent forms and came away unimpressed. "We actually had our legal people look at some of them because we need to educate our primary care physicians about how DTC genetic testing companies work, and they informed us that it's not a consent, it's a contract," Eng says. "When you select 'I Agree' on many of these DTC consumer company consent forms, it actually says that you are giving them permission to sell your genomic data to pharmaceutical companies — it is stated, but in very, very couched terms buried somewhere."
But is selling information to a pharmaceutical company really so bad as long as customer data remains de-identified and customers have consented to their data being accessed for research? "Morally, it's a bit cat-and-mouse because DTC genetic testing companies hide the sales aspect away in the consent form," Eng says. "Now, in the best of worlds, [the data is] anonymized and if pharma uses it to makes drugs, I think that's great. But the fear is always that the data could be re-identified and [DTC companies'] security protocols might not be as secure as, say, an academic or federal institution. And as our lawyers and bioethicists point out, what if the laws change in the year 2030 and suddenly you're not protected by X and Y? That's something people need to think about."
Behind the firewall and beyond
The new era of personalized medicine is not unlike space tourism in that, despite having previously been a realm explored only by well-trained experts, everyday consumers are now sold on a fascinating and exciting ride where they are assured that they will be ushered safely along and that no harm will befall them. To make good on that promise, those responsible for overseeing the security of personal genomic data at DTC genetic testing companies generally look to HIPAA for guidance. However, many IT managers are quick to point out that keeping genomic data safe also means keeping track of cutting-edge data security standards such as ISO20072, an information security standard developed by the International Organization for Standardization and the International Electrotechnical Commission.
"While we are complying with HIPAA as a required minimal baseline, I'm implementing and strengthening our controls based on ISO20072 and I'm personally connected to 50 LinkedIn groups that are all privacy- and security-related, so everyday I'm getting the latest thinking from experts and lawyers," says Michael Cox, chief privacy officer at Pathway Genomics. "I also read the annual data breach investigation reports that the big consulting firms put out that highlight the root causes of most of the breaches, so I'm constantly evaluating how to mitigate those risks."
From an IT perspective, Cox says threats can arise either externally, in the form of hackers after intellectual property or personal information, or internally, when employees with ready access to the data are to blame for a breach. However, no data is ever completely secure all of the time, especially when there are no clear-cut regulations that speak directly to how DTC genetic testing companies should manage IT operations.
Internal threats to privacy are not always thwarted by impenetrable IT security technology and clearly defined privacy protocols. After all, personal genomics is still a wet lab science where the security X-factor is human involvement. Last year, 23andMe delivered the wrong test results to some 96 customers, a problem the company accounted for on its blog, the Spittoon, as a lab error resulting from the incorrect placement of a single 96-well plate used to process samples at its contract laboratory. While not a nefarious act, many of the customers who momentarily questioned their child's paternity, or were led to believe they were at risk for some disease, were less than pleased.
Alex Kompel, director of systems engineering and operations at 23andMe, keeps up with industry standards through groups like the Open Web Application Security Project, a nonprofit group focused on improving security of application software as well as implementing HIPAA de-identification guidelines. "It's really about how you keep your security system in-house, how you manage protocol and data flows. Building the technology around that is easier once you have a clear picture of how the data is being managed," Kompel says. "When we initially launched our Web site, we consciously made the decision that all communications to the Web site is going to be over encrypted HTTPS protocol — we just consciously decided to encrypt everything. But we're always looking at how other people do things, and what are the best practices."
Knome, another DTC genetic testing company, has opted to avoid the Web as a delivery channel altogether by instead sending its customers password-protected and encrypted thumb drives. "In the personal genome side of our business, we deliver the results of the analysis that we've done on a secure thumb drive that is locked and encrypted with a key, and the only person who touches that is the customer. We either mail it to them or hand-deliver it to them as part of a roundtable discussion," says Jim D'Augustine, chief technology officer at Knome. "No identification information about the individual ever passes through our systems, and we will not deliver across the Internet, but across these secure keys, so that the customers is always in control of the information."
Knome uses an encryption product by a company called IronKey that is commercially available and offers varying degrees of complexity. "We use those to establish password access to the key. All of the data on the key is encrypted and the passwords are maintainable through the IronKey infrastructure," D'Augustine says. "Many folks are concerned about their privacy and they would rather be safe than sorry, and this basic approach has been very well received at the customer base. Even if you lose the passcode key, it's extraordinarily unlikely that anyone else could do something with [your thumb drive] even if they wanted to."
Kári Stefánsson, the founder of DecodeMe, says that DTC genetic testing companies can perform all the security due diligence in the world, but that it is ultimately up to the consumer to maintain security over his or her genetic test results. "It's relatively easy to protect the privacy of the customer to the extent that they want us to do that, but the vulnerability lies in their own hands, how well they protect their own passcodes, and things like that," he says. "You remember the principle of old high school physics that the measurement with the greatest error risk is what determines the error of the experiment? Well, I think the point where it's most difficult to guarantee privacy is where the individual handles this information."
Knowledge is power
Initiatives like the Personal Genome Project — which aims to eventually publish 10,000 genomes online — approach personal genomics in a manner that is not primarily focused on concerns about data security and anonymization. Instead, they maintain that the best way of insulating individuals against any shocks that may result from some unexpected violation of privacy is with a thorough crash course in genomic worst-case scenarios. Before they can submit a sample for analysis, potential PGP enrollees are asked to take an exam that tests their understanding of both the science and the risk of potential re-identification. "The best approach that we can hope for is to tell people what the risks are upfront," Angrist says. "These range from the realistic — including non-paternity and late onset disease risk disclosure — to the more fanciful sci-fi things — like you could be cloned or your DNA could be planted at a crime science — in order to make sure that people have thought about what scenarios are likely and unlikely, but still possible."
Life Technologies' Homer points out that the genomics research community and DTC genetic testing companies need to constantly make better efforts to keep the public informed about privacy risks, however small. "I think the consent side is what you want to focus your attention on, trying to explain to the public and research participants what they're contributing when they contribute their DNA, and how that could affect them going into the future — making sure that the individual knows what they're consenting to in unforeseen consequences," Homer says. "We can all come up with these scenarios where a relative finds out you're in the data and you share a common risk allele, or your grandfather is part of a schizophrenia study, and while I think the actual risks there are very low, you still need to work on the consent part to make sure people understand that there is a minor chance of that happening."
Isaac Kohane, chair of the informatics program at Children's Hospital Boston, says that ultimately, the best way forward on the privacy front for personal genomics might be to put individuals in the driver's seat by implementing a selective sharing policy, similar to the way users on social networking Web sites can pick and choose which bits of information they reveal to network users.
"We do this all the time — patients have autonomy and they get to choose what they want to share and they should have full autonomy over what they share with doctors as well as researchers. And if it's not perfect, that's tough for the doctors and tough for the researchers," Kohane says. "The notions of personal autonomy and privacy — in our society at least — are still paramount, and there are increasingly sophisticated solutions that allow you to have that level of functionality in our health care data."
In January, Malin and his colleagues published a paper in the Journal of the American Medical Informatics Association describing an approach they developed for analyzing the demographics of patient cohorts in five medical centers for the NIH-sponsored Electronic Medical Records and Genomics network. They found that the alternative de-identification model could be implemented to protect data sets with the same risk level for re-identification as the Safe Harbor Standard.
"Now you can get away from that cookbook approach and you can quantify what the likelihood of identification is for these features that are known to be potential identifiers and have been designated so by Health and Human Services, and use an alternative model," Malin says. "The risk will never be zero under any regulation — you could tell me that in order for me to share data, that each record needs to correspond to at least 20,000 people in a population, but that still means that there's a one in 20,000 chance that I could identify that record. So, we try to give a more quantitative policy analysis approach and come up with acceptable technological approaches."
In order to make the challenge of facilitating effective research and maintaining privacy more tractable, Malin is working on a project that aims to prevent the linkage of individual genomic sequences to resources that contain patient identifiers. The method works by de-identifying data according to HIPAA rules while at the same time supporting GWAS validation and clinical case study research. In a PNAS paper published last spring, Malin and his colleagues demonstrated the effectiveness of their algorithm — called Utility-Guided Anonymization of Clinical Profiles — with roughly 3,000 patient electronic medical records from Vanderbilt's University Medical Center. UGACLIP can be used to prevent individual re-identification from clinical features while at the same time allowing for patterns of International Statistical Classification of Diseases and Related Health Problem codes to be extracted from a data set.
Malin says one of the biggest challenges with encrypting personal genetic data is that encryption significantly slows down research. "You can encrypt genomic data, and you can even query encrypted data, but the challenge when doing that is that it's not as efficient as you'd want it to be," he says. "If you're just doing this for a single computation in terms of wanting to know what the odds ratio is for a particular patient in terms of whether or not they will respond to a drug, that may be scalable. But when you want to compare that individual against a whole population of records, that becomes a little bit more computationally challenging."
With this in mind, Malin set out to find a better way to use cryptography that would allow researchers to work with health records and genetic data without altering or suppressing information in a data set. The approach, called SIMGAP — secure information management for genotypes and phenotypes — is an open-source, encrypted approach to managing a database on a secure server designed to mitigate the potential for hacking. "We're working on keeping the data in its most specific state so that any type of question that you would have been able to answer in the original data you would still be able to answer in the shared data. The only difference now is that your results would not reveal what any particular individual has," he says. "What we're saying is that we could encrypt all the information — be it genomic data, clinical data, or demographic data — and when the queries are issued against it, we would ensure that the third party managing that data would not be able to infer what is actually being reported." In essence, SIMGAP shields the researcher from having to worry about the cryptographic component and allows for the security of the system to be directed from behind the scenes so that users could potentially manage that data in a third party environment. The new database infrastructure is being designed in a generic form so that it could potentially replace any third party database system, or be extended for federated systems such that the data never leaves a particular organization's database. Malin says that the SIMGAP model could be implemented in a DTC genetic testing IT architecture to allow for secure research projects and collaborations.
Have you seen my genome lately?
The debate surrounding the possible ethical, legal, and privacy implications of having one's genetic information used for research may still but up in the air, but until some regulatory body takes action or legal precedents are reached, it's business as usual for DTC genetic companies that allow consumers to take part in research studies. At 23andMe, Kompel works to develop policies and protocols surrounding data management for the company's research efforts that are aimed at preserving customer privacy. According to Kompel, this involves not only stripping the data of any identifiers, but also ensuring that the research arm of the operation remains separate from the commercial production environment. "Basically, we try to classify the data, assign proper access levels to it, and monitor how it's accessed and changed," he says. "There's a strict protocol that describes what data crosses a line between the production site where our customer data is stored and the research site where researchers look at the data and analyze any patterns in it. We also use some software tools to map data to the research to make sure that no personal information leaks into the research environment."
DecodeMe uses third-party encryption methods to hide personal identifiers attached to incoming data, be it for consumer genomics or research. But Stefánsson is quick to point out that customer data is only stored and analyzed for use within the context of the personal genomics service they are paying for and never for research purposes. "We never use the data for personal genomics for science, so in that way we are very, very different from 23andMe, and we would never dream of using reported phenotypes coming through our direct-to-consumer service in our attempt to make discoveries. I don't think that that's a particularly smart approach," Stefánsson says. "We want to make sure that the customer data will only be used by him or her. Also, the data coming out of the DTC service — the phenotypic data — is just not good enough in most instances, so the assumption that you can easily do science with such data is flawed."
Even proponents of personal genomics like Stefánsson say that it is still early days and that much regulatory oversight is needed as both the public and health care providers have only just begun to acquaint themselves with the technology and its implications. "I'm absolutely convinced that we have yet to figure out how to maximally take advantage of the tests within the health care system and we have to find a place for them within the regulatory network that we have for health care," he says. "We do need more regulatory oversight with DTC genetic testing — we have to make sure that the claims made about protection of privacy are correct and consumers are made aware of the fact that there is no such thing as 100 percent secure data."