Skip to main content
Premium Trial:

Request an Annual Quote

New Industry-Academic Consortium Aims to Develop Communal Allele Frequency Data Repository


NEW YORK (GenomeWeb) –  Thirteen institutions and life sciences companies have teamed up to launch the Allele Frequency Community, a freely accessible communal repository that will offer access to high-quality and ethnically diverse allele frequency information from human genomes and exomes.

According to the coalition website, the resource, which is currently in beta, provides an environment for securely sharing anonymized, pooled allele frequency statistics that can be used in biomedical research and clinical efforts. Participating labs will be able to share privately generated variant call files with others in the community and will benefit from access to a much broader pool of information and richer annotations than they would have access to on their own.

There are no admission costs for joining or contributing to the repository. Interested researchers simply have to sign up for accounts and then opt in to participate in the community — meaning that they'll contribute variant call files from their own exome and genome datasets to the common pool. They'll then be able to upload and annotate their datasets with allele frequency information from the entire community and also access the frequency information available in the resource. Only anonymous, pooled allele frequencies will be made available to participants in the community, thus protecting patients' privacy.

Anonymized statistics from community members' samples will be used to expand the diversity of the database over time, resulting in improved variant identification and interpretation, according to the consortium. The data is hosted and accessible via bioinformatics infrastructure developed and maintained by Qiagen, one of the founding members of the consortium.

This is the second community-based effort that Qiagen has launched in recent times. In 2013, the company offered free use of its Ingenuity Variant Analysis software as part of an effort called the Empowered Genome Community. That was an initiative that was intended to provide a secure collaborative environment for individuals who'd had their genomes sequenced to share their data with one another and access tools to interpret it.

This current endeavor around allele frequency information grew out of conversations that began last fall between the company and some of its customers in the translational medicine and clinical research arenas about the benefits that could accrue from ongoing big data initiatives and mechanisms by which the company could provide support and help researchers make maximal use of their research, Laura Furmanski, head of Qiagen's bioinformatics business, told GenomeWeb this week. "We quickly centered in on the idea of this allele frequency community, which [would provide] a space where researchers and clinicians would be able to share their data in a very safe and compliant fashion, and would also be able to benefit from the information that others would be able to provide to that community as well."

"Over the last few years, access to allele frequency data from large populations has been the most useful resource for the interpretation of human variation," Heidi Rehm, director of the Laboratory for Molecular Medicine at Partners Healthcare Personalized Medicine, one of the members of the Allele Frequency Coalition, said in a statement. "The Allele Frequency Community is a really valuable project. I am happy to share data through this new resource and excited that many other people have agreed to do so as well."

"[Its] more than just a simple repository; it is a dynamic resource that has been designed to grow and become more informative through more use by members of the community," John Niederhuber, CEO at Inova Translational Medicine Institute, another founding partner, noted in a statement. "Large-scale datasets of diverse allele frequency data are critical to advancing personalized medicine, and ... by taking advantage of anonymized pooled data, this project will support patients and clinicians who have struggled to identify the elusive genetic changes that are necessary to diagnose and treat complex diseases."

Increased participation and contributions to the resource are expected to create greater value over time for life sciences and clinical research. Information on observed allele frequencies can create important benchmarks that improve the accuracy of findings from data generated by molecular analyses, such as from next-generation sequencing-based studies, but the research and clinical communities currently lack an "extensive, high-quality, ethnically diverse collection of human genomes as a reference dataset," the partners note on the consortium's website.

Resources such as the Exome Variant Server, the 1000 Genomes Project, and the Exome Aggregation Consortium "have been immensely valuable to the community" and  tools like Kaviar combine "such datasets into integrated allele frequencies, but public databases have not been funded to provide broad and deep ethnic representation," the members note. That's important because prospective disease-causing variants that might be rare based on the information provided in current public resources may be more prevalent in ethnic populations that aren't well represented in these repositories.  

The consortium aims to address this challenge by providing frequency information based on sequence datasets that better reflect the diversity in the human population. So far, the 13 founding partners have contributed over 70,000 datasets — including 8,000 whole genomes — from individuals from over 100 countries. According to internal benchmarking studies done by Qiagen, using the information contained in the resource resulted in a 43 percent average reduction in false positive rates in causal variant identification without a corresponding dip in the number of true variants identified.

That percentage is based on preliminary results from exploring a representative group of solved whole-genome diagnostic case studies. In those studies, Qiagen researchers assessed the degree of reduction of false positive variants both with and without the added filters that the allele frequency information in the community resource provides, Doug Bassett, Qiagen Bioinformatics' CSO, told GenomeWeb in an email. As a baseline, the company used the Ingenuity Variant Analysis software's filter cascade, which filters variants based on frequency information from the 1000 Genomes Project, Complete Genomics Diversity Panel, and Exome Varinat Server.

"As the community grows, we will continue to monitor false positive reduction rates and publish more findings as they become available," he said.

Qiagen is hosting the allele frequency data on the same infrastructure that supports its existing fleet of NGS and variant data analysis solutions as well as its planned clinical decision support software solution. However, researchers don't have to be Qiagen customers to join the community and to access and contribute to the resource, Qiagen's Furmanski told GenomeWeb.

Consortium members can currently only explore the allele frequency data using Ingenuity Variant Analysis, but the company plans to make the information accessible via other tools in its portfolio including CLC Cancer Research Workbench, CLC Genomics Workbench, and Ingenuity Clinical decision support — currently in development. The input to the community database is vcf files, but if participants have BAM, SAM, for FASTQ files, they'll be able to process then using the CLC Workbench prior to submission.

Furmanski also said that Qiagen will offer free trials of its software for community members who aren't current customers, and they'll have the option to license the company's solutions.

The allele frequency coalition will be officially introduced at the Advances in Genome Biology and Technology meeting being held in Marco Island, Fla. this week. Besides Qiagen, Partners Healthcare, and Inova, other members are Columbia University Institute for Genomic Medicine; Emory Genetics Laboratory; Erasmus University Medical Center; Icahn Institute for Genomics and Multiscale Biology at Mount Sinai; the Institute for Systems Biology; Laboratory Corporation of America; New York Genome Center; University of British Columbia; University of Washington; and Weill Cornell Medical College.