SAN FRANCISCO (GenomeWeb) – A study published by Human Longevity this week claims that it is possible to identify individuals based in part on predictions of their faces from their whole-genome sequencing data, but a number of researchers have since questioned those findings.
The reaction following the publication in the Proceedings of the National Academy of Sciences was swift. A number of researchers took to Twitter to voice their concerns, among them Yaniv Erlich, an assistant professor in computer science at Columbia University and chief science officer of MyHeritage, who also posted a critique of the paper to the preprint server BioRxiv the day after the study appeared.
Human Longevity, a San Diego, California-based firm founded by Craig Venter, offers whole-genome and microbiome sequencing as well as a suite of other imaging and blood tests designed to assess individuals' health.
Aside from Erlich's BioRxiv critique, other researchers have questioned Human Longevity's claims about the risks of storing personal genomic data in public databases. And yet others have said that the whole debate has helped highlight the impact of preprints on scientific discourse.
The idea that individuals can be identified from genomic information is not new. In fact, Erlich coauthored a study in 2013 in which he and his colleagues showed that they could identify individuals whose genomes were in a public database by matching short tandem repeats on the Y chromosome to Y-STRs in public genealogy databases, which are linked to surnames.
In his BioRxiv paper, Erlich said that with 10 minutes' work he could also match individuals to their genomes with similar success rates as reported in the PNAS study, using easily accessible demographic information.
What was novel about the Human Longevity study was the researchers' use of machine learning algorithms to create a sketch of a person's face from whole-genome sequence data. However, Erlich pointed out that in fact, the researchers don't rely much on sequence data to predict traits such as facial structure or height, but rather use the genomic data to predict gender and ancestry and then "infer something that is very close to the population average."
The Human Longevity team has said that it plans to reply to Erlich's BioRxiv criticisms, and in the meantime, the company has released a statement saying that "the researchers stand behind their methodology," although they acknowledge that the methods are still in the early stages.
The statement added that a main reason for doing the study was to point out that there is "no such thing as true de-identification and full privacy in publicly accessible databases."
But, Erlich said, researchers have "known for years that genetic information is identifiable."
Hank Greely, director of the Center for Law and Biosciences at Stanford University, said that "of all the genetic privacy issues out there, genomic facial reconstruction is quite low on my list — certainly now and probably forever."
The Human Longevity study is "playing up the privacy threat," he said, adding that he thought that such overselling of a threat is what in part caused the backlash from other genomics researchers who rely on individuals to participate in research.
People "routinely give up much more, and often much more sensitive information, through our credit cards, electronic toll paying, cell phones, Google searches, and so on," he said. "To put a different face on it, it seems to me much ado about very little."
Another issue that Erlich and others have raised via Twitter is that the Human Longevity study was originally submitted to Science, but rejected, in part due to recommendations from Erlich, who served as a reviewer. The study was subsequently published at PNAS through its "contributed" track, a method by which members of the National Academy of Sciences are permitted to submit studies, choose their own peer reviewers, and decide how to address reviewers' comments, Michael Hoffman, a computational genomics researcher at the University of Toronto, explained.
The process isn't in and of itself problematic, he said, since there are many methods of getting scientific research into the public domain, but it's not necessarily what people think of when they think of peer review.
In this case, it's been interesting, since the study spurred such backlash, including the publication of Erlich's critique on the BioRxiv server, he said.
"What happened was really interesting and turns a lot of concerns that people have about changing models of peer review on their head," Hoffman said, noting that he is also a BioRxiv affiliate, meaning he provides feedback on the service and helps in screening submitted material.
Often, when researchers want to publish concerns they find with a study published in a peer-reviewed journal, it's a lengthy process that can take months or even a year, Hoffman said. However in this case, Erlich was able to publish a criticism in just one day. "The fact that [Erlich] was able to put together his criticisms in a short, coherent manuscript definitely raises the level of the discussion," he said. Before the existence of preprints, if a researcher had criticisms about a published peer-reviewed study, journals did not have incentives to publish such concerns and study authors did not have incentives to address such concerns, especially if they weren't published. But now, in part because Erlich's criticisms are published and people are talking about them, the Human Longevity team has said it plans to respond. Hoffman added that he gives them credit for agreeing to engage and said it would ultimately make for a more substantive discussion.
He said that this recent episode isn't the first time scientists have critiqued peer-reviewed publications via BioRxiv, citing a study published by Evan Eichler's University of Washington group in Nature Genetics regarding the identification of autism genes, and a subsequent critique on BioRxiv that pointed out a statistical error the group had made.
Hoffman declined to comment on the PNAS study itself, but said that a lot of the negative reaction was due to the fact that Human Longevity has a "financial interest in convincing people that broader sharing of genomic data is unsafe." Overall, he said, the episode is a good reminder that everyone should "be more skeptical" of studies and "not rely on peer reviewers to save them from unwarranted conclusions."