NEW YORK (GenomeWeb News) – Members of the Personal Genome Project and their collaborators have developed a community resource aimed at improving the analysis and interpretation of human genomes.
In a study slated to appear online in the Proceedings of the National Academy of Sciences this week, Harvard University geneticist and PGP founder George Church and his co-authors from several institutes around the world described the software, dubbed Genome-Environment-Trait Evidence, or GET-Evidence.
Spurred, in part, by variant patterns found in the 10 whole genomes sequenced by Complete Genomics for PGP-10, the pilot phase of PGP, the team developed the GET-Evidence tool as a public resource for prioritizing and interpreting variants of potential clinical interest in individual genomes.
This system "facilitates whole-genome interpretation by creating an interpretation pipeline that combines genome data processing, prioritization of variants for review, and recording of variant evaluations," the study authors explained.
By producing lists of variants in a given genome that have or have not been found in genomes analyzed previously, the software provides a framework for considering both known and rare variants as more and more genomes are assessed, building a resource to assist in studies of genotype-phenotype interactions and the environmental factors influencing them.
"By creating such a shared central resource for recording interpretations, GET-Evidence can act as a forum for building consensus on interpretation," Church and his colleagues wrote.
The PGP was launched in 2007 with the goal of sequencing individuals with a range of documented phenotypes.
To be eligible for the study, participants must demonstrate that they understand the protocols and risks of the PGP study itself as well as broader concepts related to genetics and human subject research. They also agree to an open informed consent policy that waives the expectation of genetic privacy or privacy surrounding health information, with some study volunteers choosing to identify themselves outright.
By compiling genomic, medical, and other information for each individual — and generating cell lines from participant samples that can be obtained by other researchers via the Coriell Cell Repositories — the PGP researchers aim to establish a "public resource where participants acknowledge and agree to the potential risk of reidentification."
"This public resource not only shares genome data publicly but brings these together with publicly shared phenotype information, genetic interpretations, and cell lines," they added, explaining that "such integrated data means the PGP can provide common ground for many types of genome research."
At the Genomes, Environments, Traits conference in Cambridge, Massachusetts in 2010, Church said the PGP had secured institutional review board approval to study as many as 100,000 individuals — a figure echoed in the new PNAS paper.
With more than 1,800 individuals enrolled in PGP as of May, researchers involved in the effort are continuing to work out methods for analyzing information within the tide of genomic data it will face in the future.
"Beyond generating an initial public resource of linked genotype and phenotype data, a key goal of our pilot was to develop and prototype methods for interpreting genome information and making these interpretations public," the team wrote.
To that end, the researchers assessed variant patterns in the genomes of PGP-10 pilot participants to look for information that could aid in the interpretation of other genomes.
For instance, the team tracked down 3.2 million substitutions per person, on average, when they compared PGP-10 participants' genome sequences, generated from Epstein-Barr virus-transformed white blood cell lines, with build 37 of the human reference genome.
Of these, an average of 8,250 substitutions per individual are predicted to be non-synonymous changes that alter the sequence of a resulting protein, they noted. Among them are variants previously proposed as disease risk alleles that did not fit with the phenotypes of the PGP-10 participants, who appeared to be disease-free when their samples were obtained.
In an effort to more accurately prioritize such variants, the researchers developed the GET-Evidence method, which compares variants in each new genome with those assessed before it and assigns prioritization scores to help narrow in on variants expected to be clinically relevant. These scores take into account not only the strength of the evidence behind a given variant's proposed effect, but also the apparent clinical importance of the variant.
"Variant evidence scores and clinical importance scores are used to generate an overall assessment of evidence (uncertain, likely, or well-established) and clinical importance (low, moderate, or high)," the researchers explained.
Information on the variants can be updated by the software's users as related research becomes available, the study authors noted, and the GET-Evidence format makes it possible to add annotations or summaries that link variants to related publications.