NEW YORK – An international team has characterized the function of genetic variants across more than two dozen tissue types in four individuals of European ancestry who were also assessed by high-quality, long-read sequencing.
"This catalog potentially enables us to bootstrap the determination of allelic variants in other, new individuals in a generalizable way," senior and co-corresponding author Mark Gerstein, a bioinformatics researcher at Yale University, said in an email. "This is particularly true for an individual of the European population."
Using Pacific Biosciences long-read sequencing, Oxford Nanopore long reads, linked-read 10x Genomics profiling, and Illumina short-read sequencing, he and his colleagues were able to generate genome sequences representing both the maternal and paternal haplotype for two male and two female individuals who also participated in the Genotype-Tissue Expression (GTEx) project. The data was analyzed in combination with findings from around 15 functional genomic assays applied to roughly 30 tissues from the participants.
The team described the resulting EN-TEx dataset in a study published in Cell on Thursday. Together with corresponding statistical and deep learning models, EN-TEx is expected to help in annotating and untangling tissue-specific variant effects in personal genome sequences generated in the future.
"With the high-quality genomes and the matched assays and tissues in EN-TEx, the catalog of allele-specific events can help ascertain variant impact in an extremely precise fashion because one has a 'natural control' in comparing the maternal and paternal haplotypes," Gerstein explained, noting that the catalog "lets us develop generalizable models for variant impact."
With these models, the team showed that they could highlight regulatory elements that are overrepresented at expression quantitative trait loci (eQTL) or loci identified through genome-wide association studies, for example, while flagging portions of the genome where variant changes had the most pronounced regulatory impacts.
"Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription factor-binding motifs particularly sensitive to variants," the authors reported. "Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci."
The newly developed EN-TEx models also provided an opportunity for transferring eQTL effects found in an easily accessible tissue, such as the blood, to another tissue or organ type, Gerstein said, offering a "a significant expansion to the GTEx eQTL catalog, linking uncharacterized variants to genes with known function."
Consequently, the work expands on earlier efforts to understand regulatory parts of the genome and their effects such as GTEx or ENCODE, bringing in a wide range of analytical tools, variant types, and datasets to systematically assess more than 1 million allele-specific loci.
The EN-TEx resource currently does not include data on brain tissues, Gerstein noted, adding that it would be beneficial to expand the effort to include such data, along with similar datasets from additional, non-European populations.
"It would be powerful to extend the EN-TEx approach to additional individuals, allowing one to develop QTL-like studies, and then include individuals with different ancestries (e.g., African or Asian ancestry)," he said, noting that the current EN-TEx collection is being made freely available to other investigators.
"We envision that in the near future, with the decreased cost of sequencing, generating a matched personal genome sequence as an accompaniment to each functional genomics experiment will become the norm," he and his co-authors wrote, adding that the "EN-TEx personalized epigenomics approach for analyzing the impact of genome variation will necessarily become commonplace, potentially providing benefits for precision medicine."