NEW YORK – A new project aims to address an ongoing bias in genomics toward European populations that organizers maintain has resulted in misdiagnoses, misunderstandings, and inconsistent care in non-European communities.
The effort, called Link23, is supported by Genomics England and Data Science for Health Equity. It recently went live with its platform, through which Link23 intends to build a community as well as provide a wealth of analytical tools to improve equity in genomics.
Maxine Mackintosh, program lead on diverse data at Genomics England, is leading Link23. According to Mackintosh, while it is understood in the genomics community that Europeans are overrepresented in datasets, which can have a negative impact on the quality of genomic medicine available to people of other ancestries, the issue has not yet been sufficiently addressed.
"The narrative is centered on the need to collect more data," said Mackintosh, "which is incredibly important, but generally in the community there is an underappreciation for the impact of changing models and using new approaches."
In order to make genomic medicine more equitable, Mackintosh holds, the answer is therefore to not only create and include new datasets but also rework how data from diverse populations is analyzed and put to use. "It's not always a big data problem, but a behavioral problem that can have technical solutions," she said.
While other avenues for addressing inequalities in genomics exist, such as the Global Alliance for Genomics and Health, an international consortium focused on setting standards of managing genomics data, the genomics community could benefit from looking beyond its existing institutions for ideas on how to improve equity.
"I have found that the genomics world is very isolated in the genomics context," remarked Mackintosh. "There are so many tools and methods that exist outside the world of genomics that have no way of seeping or leaking in."
Mackintosh and colleagues conceived of Link23 earlier this year. While organizers are affiliated with Genomics England and DSxHE, Link23 is an independent endeavor. Over the summer, instigators developed Link23's structure and focus of activities, which resulted in a launch last month. However, as Mackintosh noted, the effort is still in its infancy. "A launch means that we are a space on the internet where you can see that these things exist."
As noted on its website, Link23 wishes to "curate, build, implement, and scale practical solutions that make genomics as equitable as possible." It will accomplish this by providing tools, products, resources, and handbooks to the genomics community, serving as a place for collaboration, and fostering a global community.
Its toolbox currently contains upwards of three dozen tools and resources. These range from Wellcome's "Anti-racist principles, guidance and toolkit," a collection of principles published by the Wellcome Foundation, to GA4HG's Genomic Data Diversity Standard, a guidance for more inclusive genomic research and health implementation.
There are also purely analytical tools, such as Globetrotter, an algorithm developed by Garrett Hellenthal's group at University College London for population admixture analysis, and even some software developed by private companies, such as BenevolentAI's Diversity Analysis Tool. The latter can produce demographic analysis reports based on health datasets, with fields for age, sex, ethnicity, race, and socio-economic status. BenevolentAI is headquartered in London and relies on artificial intelligence for drug discovery and development.
Aishaini Puvanendran, lead technical program manager for BenevolentAI, said the company is "excited to be a tooling partner for the Link23 initiative." As data and knowledge are the foundations of its AI-enabled drug discovery approach, BenevolentAI works with improving its diversity to better represent human biology and to develop better medicines.
In addition to providing its Diversity Analysis Tool to the Link23 community, BenevolentAI has been involved in the initiative's early planning, and is working to "raise awareness and build solutions to lessen the immediate effect of poor diversity in genomics data," Puvanendran said.
Later this month, Link23 will host a kickoff meeting and commence its next phase of activities. In December, it will also host a workshop focused on how it adds, vets, and curates new tools for users. In general, Link23 will host three types of tools: those built specifically for an equity purpose, common bioinformatics tools with an equity component, and tools that impact and affect behaviors and engagement. "Those will be the three highest levels of taxonomy," Mackintosh said.
In coming months, Link23 will also build out community infrastructure, decide on a system of governance, and begin working on a series of challenges, where it will work with partners to solve "achievable and relevant problems" that can be piped through the community.
One confirmed partner is the GWAS Diversity Monitor, an interactive real-time dashboard that tracks population diversity in genome-wide association studies. Mackintosh noted that future partners could benefit from taking part in such challenges, as it would help them fine-tune their tools for wider adoption.
"All of a sudden, there is a global network of specialists that can work on improving it," said Mackintosh. She said that a potential partner could then say that its tool has been enhanced by the Link23 community.
Such challenges will be a focus for Link23 next year, Mackintosh forecast. "We will start piping proper problems through the Link23 community," she said, "so it is not just a list of tools."
The equity agenda
In April, Mackintosh and partners at the University of Oxford and University College London published a preprint on bioRxiv, entitled "Optimal strategies for learning multi-ancestry polygenic scores vary across traits." In it, the authors noted that polygenic scores have been less accurate in people with non-European ancestry, and that while there has been a greater effort to train scores based on diverse populations, the best way to maximize performance is unknown.
In the paper, the authors looked at the effect of sample size and ancestry composition on the performance of polygenic scores for 15 traits in UK Biobank data. They found that some scores based on a small training set of individuals of African ancestry outperformed scores estimated based on a European-ancestry set, leading them to conclude that targeted data collection from underrepresented groups could address disparities in the performance of polygenic scores.
Mackintosh described ancestry-related issues with polygenic risk scores as the "canonical example" of inequality in genomic medicine, but noted there are others. She cited a 2016 paper in the New England Journal of Medicine that reported that mutations associated with a higher risk of hypertrophic cardiomyopathy were actually more common in Americans with African ancestry, which had resulted in misdiagnoses in cases where the mutation was benign.
Mackintosh took a nuanced view toward resolving such equity issues, noting that some traits are "very transferable" across ancestries and similarly do not require large sample sizes to improve risk scores. "Some polygenic scores are susceptible to ancestry differences, and others are not," she said. "You can't necessarily say that polygenic scores are always non-transferable across ancestries."
One outstanding question is why these issues have not been addressed to date. According to Mackintosh, the narrative around the equity agenda has become stronger in the last 10 years, and Link23 is the next chapter in the story of attaining equity in genomics.
"Some people might push back at the fact that this wasn't done before," said Mackintosh. "But by bringing together different approaches and different models, you can have just as big an impact on equity as if you spent hundreds of thousands of pounds to sequence more whole genomes."