NEW YORK – A Chinese-Italian research team has developed a new open-source genetic reporting tool for individuals who have obtained their own genomic data from consumer genomics providers.
Called Personal Access to Genome and Analysis of Natural Traits, or PAGEANT, the platform enables users to analyze and interpret their genomic data at no cost. Both SNP array and next-generation sequencing data can be analyzed using the new resource, which was described in a Nucleic Acids Research paper published last month.
The collaboration that produced PAGEANT evolved out of a relationship between scientists at the University of Camerino in Italy and the Peking University School of Public Health in Beijing. According to corresponding author Valerio Napolioni, a geneticist at the University of Camerino, he previously cooperated with Jie Huang's group at Peking University to produce a tool called PERHAPS that supports paired-end short reads-based haplotyping from next-generation sequencing data. PAGEANT is the result of several projects that are underway.
"We both studied human genomics and were interested in creating a free and reliable version of a genome interpretation tool, which is essential for science to be translated into health and knowledge," said Huang.
The argument that people holding their genomic data, acquired via ancestry and health and wellness providers such as Ancestry, 23andMe, MyHeritage, or Family Tree DNA, are not being well served is a central tenet of PAGEANT's creators. This is no small number. According to the paper, as of 2018, more than 25 million people worldwide had taken one of these consumer tests.
However, even though scientists have been churning out loci of interest via nearly infinite genome-wide association studies over the past decade and a half, the researchers maintain in the paper that insights gleaned from these studies have yet to sufficiently make it into public use.
"The public lack the means to avail such data for interpretation of their own genomes," the authors write in the new paper. Direct-to-consumer testing, meanwhile, is "under strict government regulation" and faces multiple challenges, such as concerns around psychological impact, lack of access to genetic counseling, and the lack of validity or utility of results.
There does exist an extensive menu of free academic third-party tools for generating similar results and which vary in scope and accessibility. The new paper identifies nearly 30 of such tools that have existed, although the list continues to fluctuate as new ones are deployed and older ones are deactivated. For instance, DNA.Land, a data upload site that provided free ancestry and trait reports and cousin matching, went offline in November. Other sites, such as Promethease, recently transitioned into a for-profit model and were not considered a direct competitor to PAGEANT. Interpretome, a somewhat similar tool, has also been deactivated.
In the paper, the researchers identified two tools in existence that were the most similar, Impute.me and openSNP. Impute.me was developed by Danish researchers at Sankt Hans Hospital in Roskilde in 2015. OpenSNP was developed by a German research team in 2011.
PAGEANT's makers benchmarked the new tool against Impute.me in the development process.
They also used genome data from the 1000 Genomes Project as well as GWAS summary statistical data from the COVID-19 Host Genetics Initiative. They were supported financially by the National Key Research and Development Program of China and the Peking University Research Initiation Fund, as well as Innova Package, a Fujian, China-based company, and some personal funds.
Huang said that PAGEANT is different from most tools out there, commercial or third party, because of its privacy and security features, as well as its ability for customization. "We value the confidentiality of human genome data," noted Huang. "Therefore, PAGEANT can be run locally without sending any data to a remote server," he said. In addition, each line of PAGEANT's source code is open, and its analysis tool can be refitted as new scientific papers are published.
"If a user feels that it is more accurate to predict his genetic risk for lung cancer based on a million genetic variants reported by a new paper, he could easily do that in PAGEANT," he said.
The ACGTU philosophy and 5Q design
PAGEANT is organized around five philosophies, according to the authors, which they summarize as ACGTU, the five letters for nucleic acids. A stands for academic quality and standards. C is for confidential data run locally. G refers to a generalizable architecture and algorithm. T is for the transparency of its source code, and U stands for user-centric, as users can add or move traits from a genetic report.
These five philosophies are combined with what the authors call PAGEANT's 5Q design. The 5 "Qs" are quality control of genetic data; qualitative assessment of genetic characteristics of absolute or high uncertainty; quantitative assessment of health risk susceptibility based on polygenic risk scores; querying of third-party variants databases such as ClinVAR and PharmGKB; and generating secure quick response codes for tagging individual genomes.
Huang called PAGEANT's ACGTU philosophy and 5Q design approach "revolutionary" and said the researchers hope their perspective and methods will "gradually be viewed as a standard by the DTC field."
Additional activities related to PAGEANT are in progress, Huang noted. The researchers have secured an undisclosed amount of funding from Peking University to establish PAGEANT.me, and they will continue to improve the platform. Huang stressed that since PAGEANT runs on a user's desktop, it does not collect data from users for use in other genetic studies, for example.
"We understand that a user’s genetic data is confidential and private, and we have no intention and it is none of our business to collect any data," said Huang. "We simply want to promote an academic [tool] that hopefully could be widely adopted."
A valuable addition
Cathryn Lewis, a professor of genetic epidemiology and statistics at King's College London, is familiar with third-party analysis tools and polygenic risk scores. She co-authored a review of such tools in the journal Genome Medicine last year. She has also collaborated with Lasse Folkersen, one of the developers of Impute.me, and this month published a tool for translating polygenic scores onto the absolute scale using summary statistics in the European Journal of Human Genetics.
When asked about PAGEANT, Lewis called the platform a "valuable addition to the open-source, open-access landscape." According to Lewis, the need for tools like PAGEANT, as well as Impute.me, "highlights a growing discontinuity in the provision of genetic data." While whole-genome sequencing for rare disorders is becoming ubiquitous, Lewis said that tests that link common variation to complex diseases are not yet widely available.
"This leaves people to seek information on their genetic susceptibility through direct-to-consumer providers, and then use independent software for polygenic scores," said Lewis.
Also, while tools like PAGEANT might provide accurate information to users, Lewis said that it is still unknown how people are interpreting their results, and if the reports of increased or decreased risk are leading to increased sharing of data with clinical providers or changes in lifestyle. She noted that a score in the 90th percentile might be alarming to a user, but one in 10 people will have a similar or higher risk score.
"In interpreting a polygenic score, knowing where our score lies relative to others is a core piece of information," said Lewis. Users should therefore know their relative risk compared to the population prevalence, as well as their absolute risk. "Accurate software is only the first piece of giving risk information, and it is essential to be paired with clear communication of how to interpret our results," Lewis said.
Sarah Nelson, a research scientist at the University of Washington's department of biostatistics, authored a review of third-party genetic interpretation tools in 2018 in the Journal of Genetic Counseling. She noted that there is "a lot of flux" in terms of available tools, as new ones appear and others are shuttered over time.
"Even though third-party tools aren't going away as a class of consumer products, there have been a lot of changes," said Nelson, noting that DNA.Land shut down in November, and that GEDmatch was acquired by Verogen, a forensics company, in December 2019.
"Tools come and go, they are more transient than DTC companies, though there is flux in that market as well," Nelson pointed out. She added that such volatility could be "worrisome" for users who choose to upload their data to such sites that collect anonymized raw data, an issue that PAGEANT has addressed by allowing users to run the analysis locally on their desktops.
"Even if this tool goes away in two years, it's not like they have thousands of raw data profiles sitting on a server somewhere," Nelson said.